LINEAR ALGEBRA 


Jin Ho Kwak 
Sungpyo Hong 


Pohang University of Science and Technology 


POSTECH 


Preface 


Linear algebra is one of the most important subjects in the study of science 
and engineering because of its widespread applications in social or natural 
science, computer science, physics, or economics. As one of the most useful 
courses in undergraduate mathematics, it has provided essential tools for 
industrial scientists. The basic concepts of linear algebra are vector spaces, 
linear transformations, matrices and determinants, and they serve as an 
abstract language for stating ideas and solving problems. 

This book is based on the lectures delivered several years in a sophomore- 
level linear algebra course designed for science and engineering students. 
The primary purpose of this book is to give a careful presentation of the 
basic concepts of linear algebra as a coherent part of mathematics, and to 
illustrate its power and usefulness through applications to other disciplines. 
For this, we made concentration on the system Ax = b of linear equations 
throughout the book. While trying to find the most general method of 
solving the linear systems, we introduce the basic concepts of the linear 
algebra, and derive their basic properties. As a consequence of the theory of 
the basic linear algebra, we derive the most general formula for the solution 
of the general linear systems in Theorem 7.16. 

We also have tried to emphasize the computational skills along with 
the mathematical abstractions, which have also an integrity and beauty 
of their own. The book includes a variety of interesting applications with 
many examples not only to help students understand new concepts but also 
to practice wide applications of the subject to such areas as differential 
equations, statistics, geometry, and physics. Some of those applications 
may be not central to the mathematical development and may be omitted 
in a syllabus at the discretion of the instructor. Most basic concepts and 
introductory motivations begin with examples in Euclidian space or solving 
a system of linear equations, and are gradually examined from different 
points of views to derive general principles. 

For those students who have finished first a year calculus course, lin- 


vi Preface 


ear algebra may be the first course in which the subject is developed in 
an abstract way, and we often find that many students struggle with the 
abstraction and miss the applications. Our experience is that, to under- 
stand the material, students should practice on many problems, which are 
the most likely to be omitted because of a lack of time. To encourage the 
students in repeated practice, we placed in the middle of the text not only 
many examples but also some carefully selected problems, with answers or 
helpful hints. We have tried to make this book as easily accessible and clear 
as possible, but certainly there may be some awkward expressions in sev- 
eral ways. Any criticism or comment from the readers will be appreciated 
anytime. 

We are very grateful to many colleagues in Korea, especially to the 
faculty members in the mathematics department at Pohang University of 
Science and Technology (POSTECH), who helped us over the years with 
various aspects of this book. For their valuable suggestions and comments, 
we would like to thank the students at POSTECH, who have used photo- 
copied versions of the text over the past several years. We would also like 
to acknowledge the invaluable assistance we have received from the teaching 
assistants who have checked and added some answers or hints for the prob- 
lems and exercises in this book. Our thanks also go to Mrs. Kathleen Roush 
who made this book much more legible with her grammatical corrections in 
the final manuscript. Our thanks finally go to the editing staff of Birkhauser 
for gladly accepting our book for publication. 


Jin Ho Kwak 

Sungpyo Hong 

E-mail: jinkwak@postech.ac.kr 
sungpyo@postech.ac.kr 


“Linear algebra is the mathematics of our modern technological world of 
complex multivariable systems and computers” 
— Alan Tucker — 


“We (Halmos and Kaplansky) share a love of linear algebra. I think it 
is our conviction that well never understand infinite-dimensional operators 
properly until we have a decent mastery of finite matrices. And we share a 
philosophy about linear algebra: we think basis-free, we write basis-free, but 
when the chips are down we close the office door and compute with matrices 
like fury” 

— Irving Kaplansky — 


Preface for the 3-rd edition 


The following diagram illustrates the dependencies of the Chapters: 


Chapter 1 


Linear Equations and Matrices 


Chapter 2 
Vector Spaces 


apter 


Linear Transformations 


Chapter 5 
Inner Products 


Chapter 4 
Determinants 
Chapter 6 
Diagonalization 
Chapter 7 
Complex Vector Spaces 
vhapter 8 
Jordan Canonical Forms 


Chapter 9 
Quadratic Forms 


In the third edition, the major changes from the second edition are the 
following: 


vil 


viii Preface 


1. The Chapters are rearranged: the chapter of Determinants is placed 
after the chapter of linear transformations, since the determinant is intro- 
duced in an axiomatic way through the concept of skew-symmetric multi- 
linear form rather than inductive formula in cofactor expansion. 

2. Many sections and chapters are rearranged and rewritten: e.g., Sec- 
tions 5.4 Gram Schmidt orthogonalization, 5.5 Projections, 5.6 Orthogonal 
projection are rewritten into 4 sections: 3.3 Projections, 5.4 Orthonormal 
base, 5.5 Orthogonal projection, and 5.9 QR decompositions. 

Note that the main contents of the book upto Chapter 5 culminates in 
Section 5.7: Least square solutions, which gives a general method of solving 
systems of linear equations with rank n. 

Section 5.10 QR decompositions I provides more practical method of 
computing the Gram-Schmidt orthogonalization. 5.11 QR decompositions 
II describes some other methods of QR decompositions of A, which will be 
useful in computing eigenvalues. 

3. In Section 6.3.2 Discrete dynamical systems, the limit of a sequence 
of matrices are discussed. 

4. Section 7.6, 7.7 and 7.8 are newly added. Theorem 7.16 in Section 7.7 
concludes the theory of systems of linear equations in the sense that it 
provides a general method of solving a system of linear equations. It also 
has important applications in science and engineering. Section 7.8 shows 
some practical methods of computing eigenvalues. 

5. Chapter 8 Jordan canonical form is also rewritten. 

6. Chapter 9 Quadratic forms is also rewritten: including some methods 
of estimating the eigenvalues of a Hermitian matrix, and the spacial theory 
of relativity is introduced as an application of an indefinite inner product. 

7. A number of examples and exercises are added to reflect the changes 
of the existing topics and the additional new material. 


The following sections may be skipped by the instructor’s discretion: 
51.5, §1.9, §2.9, 83.8, §3.9, §5.9, §5.11, 56.3.1, §6.3.2, except for the concepts 
of the limit of the sequences, §6.5, §7.2, §7.9, §8.3.4, 58.3.5, 

Chapter 9 if time is not enough. 


Jin Ho Kwak 

Sungpyo Hong 

E-mail: jinkwak@postech.ac.kr 
sungpyo@postech.ac.kr 


Jan. 2006, in Pohang, Korea 


Contents 


Preface v 
Preface vii 
1 Linear Equations and Matrices 1 
1.1 Systems of linear equations . . . ooa a 1 
1.2 Gaussian elimination . . . o.oo a 5 

1.3 Sums and scalar multiplications of matrices .......... 14 
1.4 Products of matrices . .. ooa a 18 
T5» Blok matrices 2.44459 Gotan ee Po ee Se oe ee ae 23 
EOS Inverse matrices ioc. 6 dle Gy BS ee 6 gee ke aS Pes 25 
1.7 Elementary matrices and computing AT... ee 28 
1.8 LDU decomposition ...............2. 0200008. 34 
1.9) “Applications iiei so Bea A ai a ea et DE A Ee a 41 
1.9.1 Cryptography . . o. oaa 41 

1.9.2 Electrical network .. aoaaa 43 

£93 Leontief Modelga sess econ Snes eek eee ao 45 

TAO EXErCISES 3-2 2 bag Pg PRE eee hee BER ch ee a S 48 

2 Vector Spaces 53 
2.1 The n-space R” and vector spaces .. aooaa 53 
22 SUDSPAaCes' 4-32 a e a Gea ai BN Deen Get a 58 
Did» BASES. 2 test ne Bie Pies Bote e tah ee ene ae Sia G 61 
247 Dimensions: sal sued excel eee hy ele ate Mae gk woh Be a 67 
2.5 Row and column spaces ...........-..0+0 0000+ 72 
2.6 Rankand nullity soc doceant m ea a a ae a e A ai e 76 
2.7 Bases for subspaces... ooa oaa 81 
2:8 [nyertibillty sa aea a a e a a ds be tee bla a a ala E 87 
297 -Appleations 20624 toe aee Gok YA oa ecg i a 4s 89 


1x 


CONTENTS 


2.9.1 Interpolation] ...................004. 89 

2.9.2 Interpolation I] -Sanres soss E aioa E oN E 91 

2.9.3 The Wronskian wi. ue aiae a e e i a N 93 
2:10: EX€rCis€S. coe ka ae em E E ee NY 94 
Linear Transformations 99 
3.1 Basic properties of linear transformations ........... 99 
3.2 Invertible linear transformations ................ 105 
3:97 PFOJECHIONS: oi ciara eo eb ee ha Prd hare ao aS 110 
3.4 Matrices of linear transformations ............... 113 
3.5 Vector spaces of linear transformations... .......... 118 
3:6 Change of bases 2 vse een ee a ea a ee ee ae a 121 
SEC ASMANYA Yale gern o A a Bae gaa eC eae sG 126 
3:0. Dualspaces -qg ae pie oe Fee See we Bs Gon a oe 131 
3:9." Appleations.. i e sesantia t eek eR Sa e Stal aes 136 

3.9.1 Computer graphics. ...............204. 136 
3:10: EXErCISES ahs. ep Bed cde n eo i of DAK weitere A as 140 
Determinants 145 
4.1 Basic properties of the determinants .............. 145 
4.2 Existence and uniqueness of the determinant ......... 150 
4.3 Cofactor expansion .............. 00-02-0000 048 156 
AA Applications. 3. edocs. ace ke 6 a hee Sd el Ge 162 

4.4.1 Inverse matrices and Cramer’srule........... 162 

4.4.2 Area and volume .............2. 2.000005 165 
As: EXT CISCS tangs baie Mies Be a a ee ea Boa ye ee N 170 
Inner Product Spaces 175 
Del’. Inner: products: soa g BA ee ee de ee eee 175 
5.2 Matrix representations of inner products............ 178 
5.3 The lengths and angles of vectors ................ 180 
5.4 Orthonormal bases ..........0.02. 0002 eee eee 182 
5.5 Orthogonal projections... soosoo a 186 
5.6 Relations of fundamental subspaces . . . osoo 192 
5.7 Least square solutions ...............-.. 00004 195 
5.8 Applications: Approximations by polynomials......... 202 
5.9 Orthogonal matrices and isometries... ............ 206 
5.10 QR decomposition] ............ 2.002020. 2 210 
5.11 QR decomposition I]. ..............2. 020200. 217 


Dcdl?~ EXCECISES aida he ee ee ok Act BO wee a A ee Be eag 219 


CONTENTS 


6 


8 


Diagonalization 
6.1 Diagonalization of matrices ...............0204. 
6.2 Eigenvalues and eigenvectors ...............0.. 
6.3 Applications: e erai a 0G ka Reb ease eh eee ea bk 
6.3.1 Linear difference equationsI............... 
6.3.2 Discrete dynamical systems] .............. 
6.3.3 Linear differential equations I .............. 
6.4 Exponential matrices... aoaaa 0-200 004 
6.5 Diagonalization of linear transformations. ........... 
6:0. (Exercises) ioei exw ge Ve Bee oe ee eck mote, Bo t 


Complex Vector Spaces 
TA “The:complex nspa Cr 6 2 wes ee oe aos Se Ba 
7.2 Hermitian and unitary matrices... .............-. 
7.3 Unitary diagonalization ...............2..2.00. 
7.4 Normal matrices .............0 200200220 e 
7.5 The spectral decomposition ...............+..04.- 
7.6 Singular value decomposition .................. 
7.7 The solutions of the general systems .............. 
7.8 The eigenvalue problem ..................... 
7.8.1 Jacobi method ...................2.4. 
G82. QR- methód Ara e aa tees A RE ea 
7.8.3 The power and inverse power method ......... 
PQ. HXCrCISCS oso! see eo we Ba Oech ee ee Ses, Bowe ot Be a A 


Jordan Canonical Forms 

8:1. Antroductions: 10% Weel Ow ae a ea te ews 

8.2 Generalized eigenvectors ............0.-0 020004 

8.3 Applications of the Jordan canonical forms .......... 
8.3.1 Computation of the powers A* of A .......... 
8.3.2 Computation of the exponential matrix ef ...... 
8.3.3 Linear differential equations II ............. 
8.3.4 Linear difference equations I] .............. 
8.3.5 Discrete dynamical systems I].............. 

8.4 Cayley-Hamilton theorem ................204. 
8.4.1 Application to the inverse matrices ........... 
8.4.2 Application to matrix polynomials ........... 
8.4.3 Applications to computations of A* and ef II... . . 

G20" EXETCISES tiie Al ks a ot ed a ee a ay ed 


Xi 


xii CONTENTS 
9 Quadratic Forms 363 
91- Introduction... . e aa a e ee 363 
9.2 Diagonalization of a quadratic form. . .. aaao 366 
9.3 Congruence relations . .. ooo a 373 
9.4 Characterization of quadratic forms .............. 379 
9.5 Application to extrema of functions on R? ... aaa aa 383 
9:6- Bilinear Torisi a axe ss ed it gee ae Seth se ea EN 387 
Oxf Duality wae ke era eae ola gue Gaga BW ea ede a a HY 391 
9.8 Hermitian forms .............. 2.000022 eee 396 
9.9 The eigenvalues of Hermitian forms............... 400 
9.10 Indefinite forms in special theory of relativity ......... 407 
9.10.1 Lorentzian space-time ................2.. 412 

9.10.2 Minkowski space-time ................4.. 416 

9.10.3 Particles observed .............-2.2.-000. 417 

9.10.4 Some relativistic effects .............2... 420 

9.11 Skew-symmetric forms in symplectic geometry. ........ 423 
9.12 Decomplexification and complexification ............ 430 
O13 -HX€rCISeS A i bode atte eo ace Bak bine Ord ih ea? al des 434 
Selected Answers and Hints 438 
Bibliography 461 
Index 462 


Chapter 1 


Linear Equations and 
Matrices 


1.1 Systems of linear equations 


One of the central motivations for linear algebra is solving systems of linear 
equations. We thus begin with the problem of finding the solutions of a 
system of m linear equations in n unknowns of the following form: 


azı + Gye%2 + +++ + Aintn = bı 

azı% + at? + +++ + mIn = b 

amii + G@m2%2 + +++ + Amnīn = bm, 
where z1, £2, ..., Zn are the unknowns and a;;’s and b;’s denote constant 
(real or complex) numbers. 

A sequence of numbers (s1, S2, ..., Sn) is called a solution of the 
system if zı = $1, £2 = S2, ..., Zn = Sn Satisfy each equation in the system 
simultaneously. When bı = b2 = ---: = bm = 0, we say that the system is 
homogeneous. 


The central topic of this chapter is to examine whether or not a given 
system has a solution, and to find a solution if it has one. For instance, 
any homogeneous system always has at least one solution z1 = %2 =+: = 
£n = 0, called the trivial solution. One may ask naturally whether such 
a homogeneous system has a non-trivial solution or not. If so, we would 
like to have a systematic method of finding all the solutions. A system of 
linear equations is said to be consistent if it has at least one solution, and 
inconsistent if it has no solution. 


2 Chapter 1. Linear Equations and Matrices 


For example, suppose that the system has only one linear equation 
aiT + a2£2 +--+ + anTn = b. 


If a; = 0 for i = 1, ..., n, then the equation becomes 0 = b. Thus it has no 
solution if b # 0 (nonhomogeneous), or has infinitely many solutions (any n 
numbers z;’s can be a solution) if b = 0 (homogeneous). 

In any case, if all the coefficients of an equation in a system are zero, 
the equation is vacuously trivial. In this book, when we speak of a system 
of linear equation, we always assume that not all the coefficients in each 
equation of the system are zero unless otherwise mentioned. 


Example 1.1 The system of one equation in two unknowns x and y is 
ax + by =c, 


in which at least one of a and b is nonzero. Geometrically this equation 
represents a straight line in the zy-plane. Therefore, a point P = (x,y) 
(actually, the coordinates x and y) is a solution if and only if the point P 
lies on the line. Thus there are infinitely many solutions which are all the 
points on the line. 


Example 1.2 The system of two equations in two unknowns g and y is 


ait +r biy = Cl 
age + boy = ©. 


Solution: (I) Geometric method: Since the equations represent two straight 
lines in the ry-plane, only three types are possible as shown in Figure 1.1. 


Case (1) Case (2) Case (3) 
{ x-y = -l ee = al oe = al 
2r — 2y = -2 z-y = 0 z—-y = 0 
y y y 
1 1 
T £ 


Figure 1.1: Three types of solution sets 


1.1. | Systems of linear equations 3 


Since a solution is a point lying on both lines simultaneously, by looking 
at the graphs in Figure 1.1 one can see that only the following three types 
of solution sets are possible: 


(1) the straight line itself if they coincide, 
(2) the empty set if the lines are parallel and distinct, 


(3) only one point if they cross at a point. 


(II) Algebraic method: Case (1) (two lines coincide): That is, the two 
equations represent the same straight line, or one equation is a nonzero 
constant multiple of the other. This condition is equivalent to 


ag = ray, by = Abı, C2 = AC} for some A F 0. 


In this case, if a point (s,t) satisfies one equation, then it automatically 
satisfies the other too. Thus, there are infinitely many solutions which are 
all the points on the line. 

Case (2) (two lines are parallel but distinct): That is, ag = Aq), b2 = 
Abi, but co Æ Aci for some A Æ 0 (the first two equalities are equiva- 
lent to a,b2 — abı = 0). Then no point (s,t) can satisfy both equations 
simultaneously, so that there is no solutions. 

Case (3) (two lines are crossing at a point): That is, the two lines have 
distinct slopes which means a,b) — a2bı # 0. In this case, they cross at a 
point (the only solution), which can be found by the elementary method of 
elimination and substitution. The following computation shows how to do 
this: 


Without loss of generality, one may assume a, 4 0 by interchanging the 
two equations if necessary. (If both a; and az are zero, the system reduces 
to a system of one variable). 


(1) Elimination: The variable x can be eliminated from the second equa- 


a 
tion by adding —— times the first equation to the second, to get 
ay 


Qat + biy = Cl 


a a 
(be = —bi)y = €&— 2e. 
ay ay 


4 Chapter 1. Linear Equations and Matrices 


(2) Since a,bz — agb, # 0, y can be found by multiplying the second 


equation by a nonzero number to get 


aby — agby 


ac + by = & 


Q1C2 — Q2C1 


4 ay by — ab, 


(3) Substitution: By substituting the value of y in the second equation 
into the first equation we obtain a new system: 


bc, — by C2 
ay by me andy 
a1 C2 — 42C1 
OS =) SS ee 
ay by = abı 

It is easy to see that the solution of this system is also a solution of the 
original system, and vise versa. Note that the condition a1b2—a2b1 Æ 0 
is necessary for the system to have only one solution. 


In Example 1.2, the original system of equations has been transformed 
into a simpler one through certain operations, called elimination and sub- 
stitution, which is just the solution of the given system. That is, if (x,y) 
satisfies the original system of equations, then it also satisfies the simpler 
system in (3), and vice versa. As Example 1.2, we will see later that any 
system of linear equations may have either no solution, exactly one solution, 
or infinitely many solutions. (See Theorem 1.6). 


E 
ge 


Infinitly many solutions Only one solution No solutions 


1 


Figure 1.2: Three planes in R° 


1.2. Gaussian elimination 5 


Note that an equation az + by + cz = d, (a,b,c) # (0,0,0), in three 
unknowns represents a plane in the 3-space R°. The solution set includes 


{ (z,y,9) | ax+by=d } in xy-plane , 


{ (x,0,z) | ax +cz=d} in xz-plane , 


{ (0,y,z) | by+cz = d } in yz-plane . 


One can also examine the various possible types of the solution set of a 
system of three equations in three unknown: Figure 1.2 illustrates three 
possible cases. 


Problem 1.1 For a system of three linear equations in three unknowns 


ait + apy + az = b 
age + azy + ao3z = bə 
aziz + azzy + a33z = bs, 


describe all the possible types of the solution set in the 3-space R°. 


1.2 Gaussian elimination 


A basic idea for solving a system of linear equations is to transform the 
given system into a simpler system, keeping the solution set unchanged, and 
Example 1.2 shows an idea of how to do it. In fact, the basic operations 
used in Example 1.2 are essentially only the following three operations, called 
elementary operations: 


(1) multiply a nonzero constant throughout an equation, 
(2) interchange two equations, 


(3) add a constant multiple of an equation to another equation. 


Moreover, each of the three elementary operations has its inverse oper- 
ation which is also an elementary operation: 


(1’) multiply the equation with the reciprocal of the same nonzero constant, 
(2’) interchange two equations again, 


(3’) add the negative of the same constant multiple of the equation to the 
other. 


6 Chapter 1. Linear Equations and Matrices 


Therefore, by applying a finite sequence of the elementary operations to 
the given original system, one obtain another new system, and by applying 
these inverse operations in reverse order to the new system, one can recover 
the original system. 

Therefore, as we have seen in Example 1.2, the solutions of the original 
system and that of new system are the same. That is, if x,’s satisfy the 
original equations, then they also satisfy those systems altered by the three 
operations, and vice versa. 


Definition 1.1 Two systems of linear equations are said to be equivalent 
if one can be transformed to the other by a finite sequence of elementary 
operations. 


Theorem 1.1 Jf two systems of linear equations are equivalent, then they 
have the same set of solutions. 


Thus by applying the three elementary operations to a given system 
finitely many times one can obtain a solution of the given system. 

These arguments can be formalized in mathematical language. Observe 
that in performing any of these three elementary operations, only the coef- 
ficients of the variables are involved in the operations, while the variables 
T1, ©Q,..., £n and the equal sign “=” are simply repeated. Thus, keeping 
the places of the variables and “=” in mind, we just pick up the coefficients 
only from the given system of equations and make a rectangular array of 


numbers as follows: 


ai a2 Gin bı 
Q21 422 a2n b2 
Am1 m2 ‘*' Amn bm 


This matrix is called the augmented matrix of the system. The term 
matriz means just any rectangular array of numbers, and the numbers in 
this array are called the entries of the matrix. In the following sections, 
we shall discuss matrices in general. For the moment, we just restrict our 
attention to the augmented matrix of a system. 
Within an augmented matrix, the horizontal and vertical subarrays 
Qij 
Q2j 
[ai1 Qi2 *** Ain bi] and 


amj 


1.2. Gaussian elimination 7 


are called the i-th row (matrix), which represents the i-th equation, and 
the j-th column (matrix), which are the coefficients of j-th variable xj, of 
the augmented matrix, respectively. The matrix consisting of the first n 
columns of the augmented matrix 


Qil Q12 Qin 
Q21 422 a2n 
amli Am2 `` Amn 


is called the coefficient matrix of the system. 

One can easily see that there is a one-to-one correspondence between 
the columns of the coefficient matrix and variables of the system. Note also 
that the last column [bı b2 --- bm]? of the augmented matrix represents 
just non-homogeneity of the system and so no variable corresponds to it. 

Since each row of the augmented matrix contains all the information of 
the corresponding equation of the system, we may deal with this augmented 
matrix instead of handling the whole system of linear equations, and the 
elementary operations may be applied to an augmented matrix just like 
they are applied to a system of equations. 

But in this case, the elementary operations are rephrased as the ele- 
mentary row operations for the augmented matrix: 


(E,) multiply a nonzero constant throughout a row, 
(E2) interchange two rows, 


(Ez) add a constant multiple of a row to another row. 
The inverse row operations are also elementary row operations: 


(E,) multiply the row by the reciprocal of the same constant, 
(E2) interchange two rows again, 


(Ez) add the negative of the same constant multiple of the row to the other. 


Note that, if a matrix B can be obtained from a matrix A by these 
elementary row operations, then one can obviously recover A from B by 
applying the inverse elementary row operations to B in reverse order. Such 
two matrices are called row-equivalent. and row-equivalent systems have the 
same solutions. 

The general procedure for finding the solutions will be illustrated in the 
following example: 


8 Chapter 1. Linear Equations and Matrices 


Example 1.3 Solve the system of linear equations: 


2y + 4% = 2 
x 2y 22 = 
3x Ay 6z = -1. 


Solution: One could work with the augmented matrix only. However, to 
compare the operations on the system of linear equations with those on the 
augmented matrix, we work on the system and the augmented matrix in 
parallel. Note that the associated augmented matrix for the system is 


02 4 2 
122 83 
3 4 6 -i 
(1) Since the coefficient of x in the first equation is zero while that in 
the second equation is not zero, we interchange these two equations: 


x 2y 22 = 3 t 272 3 
2y + 4& = 2 024 2 
3x 4y 6z = -l1 3 4 6 -l 


(2) Add —3 times the first equation to the third equation: 


r + y + 22 = 3 1 22 3 | 
W% + 42 = 2 0 24 2). 
— X% = —10 0 -2 0 -10 | 


Thus, the first variable x is eliminated from the second and the third equa- 
tions. In this process, the coefficient 1 of the first unknown <x in the first 
equation (row) is called the first pivot. 

Consequently, the second and the third equations have only the two 
unknowns y and z. Leave the first equation (row) alone, and the same 
elimination procedure can be applied to the second and the third equations 
(rows): The pivot to eliminate y from the last equation is the coefficient 2 
of y in the second equation (row). 

(3) Add 1 times the second equation (row) to the third equation (row): 


gz + y + 22 = 3 122 3 
2y + 4& = 2 024 2 
4z = -8 00 4 —8 


1.2. Gaussian elimination 9 


The elimination process (i.e., (1): row interchanging, (2): elimination of 
x from the last two equations (rows), and then (3): elimination of y from 
the last equation (row)) done so far to obtain this result is called forward 
elimination. After this forward elimination, the leftmost nonzero entries 
in the nonzero rows are called the pivots. Thus the pivots of the second and 
third rows are 2 and 4, respectively. 

(4) Normalize nonzero row by dividing the rows with their pivots. Then 
the pivots are replaced by 1: 


r + Qty + 22 = 3 1 2 2 3] 
y + 2z = 1 012 1l. 
z = -2 001 -2| 


The resulting matrix on the right side is called a row-echelon form of 
the augmented matrix, and the 1’s at the pivotal positions are called the 
leading 1’s. The process so far is called the Gaussian elimination. 

The last equation gives z = —2. Substituting z = —2 into the second 
equation gives y = 5. Now, putting these two values into the first equation, 
we get x = —3. This process is called back substitution. The computation 
may be shown below: t.e., eliminating numbers above the leading 1’s. 

(5) Add —2 times the third row to the second and the first rows: 


z + 2y = y 120 7 
y = 5 010 5 
Zu 2 0 0 1 -2 
(6) Add —2 times the second row to the first row: 
x = -3 100 -3 
y En D 010 5 
z = -2 0 0 1 -2 


This resulting matrix is called the reduced row-echelon form of the 
augmented matrix, which is row-equivalent to the original augmented matrix 
and gives the solution to the system. The whole process to obtain the 
reduced row-echelon form is called the Gauss-Jordan elimination. 


In summary, by applying a finite sequence of elementary row operations, 
the augmented matrix for a system of linear equations can be transformed 
into its reduced row-echelon form, which is row equivalent to the original 
one, Hence the two corresponding systems have the same solutions. From 
the reduced row-echelon form, one can easily decide whether the system has 
a solution or not, and find the solution of the given system if it is consistent. 


10 Chapter 1. Linear Equations and Matrices 


Definition 1.2 A row-echelon form of an augmented matrix is of the 
following form: 


(1) The zero rows, if exist, come last in the order of rows. 
(2) The first nonzero entries in the nonzero rows are 1, called leading 1’s. 


(3) Below each leading 1 is a column of zeros. Thus, in any two consecutive 
nonzero rows, the leading 1 in the lower row appears farther to the 
right than the leading 1 in the upper row. 


The reduced row-echelon form of an augmented matrix is of the form: 


(4) Above each leading 1 is a column of zeros, in addition to a row-echelon 
form. 


Example 1.4 The first three augmented matrices below are in reduced 
row-echelon form, and the last one is just in row-echelon form. 


100 12036 1032 1326 
010|, OO A ae Oe ery BN Oe a a 
00 0 00000 0001 0013 


Recall that in an augmented matrix [A b], the last column b does not 
correspond to any variable. Thus, if the reduced row-echelon form of an 
augmented matrix for a nonhomogeneous system has a row of the form 
[00 --- 06] with b #0, then the associated equation is 0zı + 0x2 + 
--+ + Ozn = b with b Æ 0, which means the system is inconsistent. If b = 0, 
then it has a row containing only 0’s, which can be neglected and deleted. 
In this example, the third matrix shows the former case, and the first two 
matrices show the latter case. 


In the following example, we use the Gauss-Jordan elimination again to 
solve a system which has infinitely many solutions. 


Example 1.5 Solve the following system of linear equations by Gauss- 
Jordan elimination. 


Ly. oT 3x9 — 223 = 3 
22, + 6%. — 243 + 4r4 = 18 
ta + z3 + 344 = 10. 


12: Gaussian elimination 11 


Solution: The augmented matrix for the system is 
1 3 -2 0 3 | 


PG Gok 181%. 
01 1 3 10 | 


The Gaussian elimination begins with: 
(1) Adding —2 times the first row to the second produces 


1 3 -2 0 3 
0 0 2 4 12 
0 1 1 3 10 


(2) Note that the coefficient of x2 in the second equation is zero and that 
in the third equation is not. Thus, interchanging the second and the third 
rows produces 


1 
0 
0 
wW 


3 
1 
0 

(3) Dividing the third row by the pivot 2 produces a row-echelon form 
3 
1 
0 


0 


We now continue the back-substitution: 
(4) Adding —1 times the third row to the second, and 2 times the third 
row to the first produces 
1 3 0 4 15 
0101 4 
0012 6 


(5) Finally, adding —3 times the second row to the first produces the 
reduced row-echelon form: 


The corresponding system of equations is 


Tı H oga = 
T2 + a4 = 4 
z3 + 2%, = 6 


12 Chapter 1. Linear Equations and Matrices 


This system can be rewritten as follows: 


a = 3 — T4 
tə = 4 — T4 
t3 = 6 — 2x4. 


Since there is no other condition on x4, one can see that all the other vari- 
ables z1, £2, and x3 may be uniquely determined if an arbitrary real value 
t € Ris assigned to x4 (R denotes the set of real numbers): thus the solutions 
can be written as: 


(x1, £2, 23, 24) = (3— t, 4-—t, 6— 2t, t), tER. 


Note that if we look at the reduced row-echelon form in Example 1.5, 
the variables x1, £2, and x3 correspond to the columns containing leading 
1’s, while the column corresponding to x4 contains no leading 1. 

An augmented matrix of a system of linear equations may have more 
than one row-echelon forms, but it has only one reduced row-echelon form 
(see Remark (2) in page 77 for a proof). Thus the number of leading 1’s in 
a system does not depend on the Gaussian elimination. 


Definition 1.3 Among the variables in a system, the ones corresponding 
to the columns containing leading 1’s are called the basic variables, and 
the ones corresponding to the columns without leading 1’s, if there is any, 
are called the free variables. 


Clearly the sum of the number of basic variables and that of free variables 
is equal to the total number of unknowns: the number of columns. 

In Example 1.4, the first and the last augmented matrices have only basic 
variables but no free variables, while the second one has two basic variables 
zı and x3, and two free variables zo and x4. The third one has two basic 
variables zı and x2, and only one free variable x3. 

In general, as we have seen in Example 1.5, a consistent system has 
infinitely many solutions if it has at least one free variable, and has a unique 
solution if it has no free variable. In fact, if a consistent system has a free 
variable (which always happens when the number of equation is less than 
that of unknowns), then by assigning arbitrary value to the free variable one 
always obtain infinitely many solutions. 


Theorem 1.2 If a homogeneous system has more unknowns than equations, 
then it has infinitely many solutions. 


1.2. Gaussian elimination 13 


Remark: The arithmetical operations (or complexity) required to do the 
Gauss-Jordan elimination to n equations in n unknowns can be estimated 
as follows: For the moment, we consider only homogeneous systems. The 
operations consist of division of the pivot equation by the pivot and then 
subtraction of a multiple (say k) of the pivot equation from the equation be- 
neath it. In the actual subtraction, we continuously meet a ” multiplication- 
subtraction” combination, and considered as a single operation. 

At the beginning, for each n — 1 rows underneath the first one, it takes 
n operations to find the multiple k and to subtract k times the first row 
from the row chosen. Consequently, we achieve zeros in the first column 
beneath the pivot. Thus this first stage elimination needs n(n — 1) = n? — n 
operations. (That is, among all the n? entries, all to be changed except 
the n entries in the first row.) At the k—th elimination stage, only k? — k 
operations are needed to clear out the column below the pivot. Thus, the 
total number of operations in the forward elimination is 


n(n n n(n n? -=n 
(1? +- +n?) Cesga s O e a 


In the back-substitution, the last unknown is found in only one operation 
(a division by the last pivot). The second to the last unknown requires two 
operations, and so on. Thus, the total operations for back-substitution is 


n(n+1)_ n? 
+2te +n 7 7 


2 2 
For a nonhomogeneous case, one can easily find that another 4 steps 


are necessary in addition to forward eliminations for the right side of the 


2 


equations. However, n* is much smaller than me when n is very large. That 


is, when n gets large, a good estimate for the number of operations is a 
Problem 1.2 Suppose that the augmented matrices for some systems of linear 
equations have been reduced to reduced row-echelon forms as below by elementary 


row operations. Solve the systems: 


100 5 1 0 0 4 -1 
alo ıı o -2|, @9lo102 6 
000 4 0013 2 


Problem 1.3 Solve the following systems of equations by Gaussian elimination. 
What are the pivots? 
=y + y SE 27 -= 0 2y = Se l 
(1) 3a + 4y + z = l0y + 3z = 
2r + 5y + 3z 3y = 


| 
=) 
=~ 
© 
Nos 
A 
8 
| 
| 
on 


| 
D 
w 
8 

| 

| 
D 


14 Chapter 1. Linear Equations and Matrices 


w + rt + y = 3 

(3) -3w - lfm + y + 2z = 1 
4w — 17x + 8y - 5z = 1 

— 5z - y + 2 = L 


Problem 1.4 Determine the condition on b; so that the following system has a 
solution. 


rt + 24 + 6 = bı r + 3y - 2z = bı 
(1) 4 2a — 3y - 2z = b (2) 4 2z — y + 32 = b 
3z — y + 4z = bz. 4r + 2 + z = bz. 


1.3 Sums and scalar multiplications of matrices 


Rectangular arrays of real numbers arise in many real-world problems. His- 
torically, it was an English mathematician A. Cayley who first introduced 
the word “matrix” in the year 1858. The meaning of the word is “that within 
which something originates,” and he used matrices simply as a source for 
rows and columns to form a rectangle. 


Definition 1.4 An m by n (written mxn) matrix is a rectangular array of 
numbers arranged into m (horizontal) rows and n (vertical) columns. The 
size of a matrix is specified by the number m of the rows and the number 
n of the columns. 


In general, a matrix is written in the following form: 


aii Q12 Gin 
a21 Q22 a2n 
A= ; = laa win 
aml Am2 ‘** Amn 
or just A = [aj;;] if the size of the matrix is clear from the context. The 


number aj; is called the (i,7)-entry of the matrix A, and is written as 
aij = [A]ij. 

An mx1 matrix is called a column (matrix) or sometimes a column 
vector, and a 1xn matrix is called a row (matrix), or a row vector. In 
general, we use capital letters like A, B, C for matrices and small boldface 
letters like x, y, z for column or row vectors. 


1.3. Sums and scalar multiplications of matrices 15 


Definition 1.5 Let A = [a;j] be an mxn matrix. The transpose of A is 
the nxm matrix, denoted by A’, whose j-th column is taken from the j-th 
row of A: That is, [A ]i; = [A] ji. 


1 35 


Example 1.6 (1) If A= | 246 


1 2 
|; then 47 = 3 4 
5 6 


(2) The transpose of a column vector is a row vector and vice versa: 


Tn 


Definition 1.6 A matrix A = [a;;] is called a square matrix of order n 
ifm =n. 


Definition 1.7 Let A be a square matrix of order n. 
(1) The entries a11, a22, ..., Gyn are called the diagonal entries of A. 


(2) A is called a diagonal matrix if all the entries except for the diagonal 
entries are zero. 


(3) A is called an upper (lower) triangular matrix if all the entries 
below (above, respectively) the diagonal are zero. 


The following matrices U and L are the general forms of the upper 
triangular and lower triangular matrices, respectively: 


Qi, a2 +++) Ain au 0 >> 0 
0 dg +++ Gon Q21 G22 t: 0 
U= , L= 
0 O >- an Ani An2 *** Ann 


Note that a matrix which is both upper and lower triangular must be a diag- 
onal matrix, and the transpose of an upper (lower, respectively) triangular 
matrix is lower (upper, respectively) triangular. 


Definition 1.8 Two matrices A and B are said to be equal, written A = B, 
if their sizes are the same and their corresponding entries are equal: t.e., 
[Aliz = [B]i; for all i and J. 


16 Chapter 1. Linear Equations and Matrices 


This definition allows us to write a matrix equation. A simple example 
is (AT)T = A by definition. 

Let Mmxn(R) denote the set of all m x n matrices with entries of real 
numbers. Among the elements of Mmxn(R), one can define two operations, 
called the scalar multiplication and the sum of matrices, as follows: 


Definition 1.9 (1) (Scalar multiplication) For an m xn matrix A = [aij] € 
Mmxn(R) and a scalar k € R (which is simply a real number), the scalar 
multiplication of k and A is defined to be the matrix kA such that [kA];; = 
k[A]i; for alli and j: i.e., in an expanded form: 


Qil oct Gin kay, +++ Kain 


ami *** Amn kamı > Kamn 


(2) (Sum of matrices) For two matrices A = [aij] and B = [b;;] in 
Mmxn(R), the sum of A and B is defined to be the matrix A+ B such that 
[A+ Bij = [A]; + [B];; for all i and j: i.e., in an expanded form: 


Qil t Gin bii = bin au tbn  -:: Qin + bin 


amli *'' Amn bmi ++: bmn ami + bmi +++ amn + bmn 


The resulting matrices kA and A+ B from these two operations are also 
belonging to Mmxn(R). In this sense, we say Mmxn(R) is closed under the 
two operations. Note that matrices of different sizes cannot be added; for 


example, a sum 
1 2 abe 
| 3 a a | de f | 
cannot be defined. 

If B is any matrix, then —B is by definition the multiplication (—1)B. 
Moreover, if A and B are two matrices of the same size, then the subtraction 
A — B is by definition the sum A+ (—1)B. A matrix whose entries are all 
zeros is called a zero matrix, denoted by the symbol 0 (or Omxn when the 
size is emphasized). 

Clearly, matrix sum has the same properties as the sum of real numbers. 
The real numbers in the context here are traditionally called scalars even 
though “numbers” is a perfectly good name and “scalar” sounds more tech- 


nical. The following theorem lists the basic arithmetic rules of the sum and 
scalar multiplication of matrices. 


1.3. Sums and scalar multiplications of matrices 17 


Theorem 1.3 Suppose that the sizes of A, B and C are the same. Then 
the following arithmetic rules of matrices are valid: 


(1) (A+B)+C = A+(B+C), (written as A+ B+C) (Associativity), 
(2) A+0=0+4+A=A, 

(8) A+(-A) = (-4)+4=0, 

(4) A+ B=B+A, (Commutativity), 
(5) k(A+B)=kA+KkB, 

(6) (K+2)A=kA+4A, 

(7) (k£)A = k (£A). 


Proof: We prove only rule (5) and the remaining ones are left for exercises. 
For any (i, j), 
[K(A+ B)lig = k[A + B]ij = K(LAlig + [Bly) = [kA]; + [kB] 
[kA + kB]ij. 


In particular, A + A = 2A, A + (A + A) = 3A = (A + A) + A, and 
inductively nA = (n — 1)A + A for any positive integer n. 


Definition 1.10 A square matrix A is said to be symmetric if A’ = A, 
or skew-symmetric if A? = —A. 


For example, the matrices A and B below 


l a b 0 12 
A=ļ]|a 3 c], B=]|-—l1 0 3 
b c 5 —2 —3 0 


are symmetric and skew-symmetric, respectively. Notice that all the diago- 
nal entries of a skew-symmetric matrix must be zero, since aii = —ajj. 

By a direct computation, one can easily verify the following rules of the 
transpose of matrices: 


Theorem 1.4 Let A and B be mxn matrices. Then 
(kA)! =kKAT and (A+B)? =A? + B7. 


Problem 1.5 Prove the remaining parts of Theorem 1.3. 


18 Chapter 1. Linear Equations and Matrices 


Problem 1.6 Find a matrix B such that A + BT = (A — B)", where 


2 —3 0 
A= 4 -1 3 
—1 0 1 


Problem 1.7 Find a, b, c and d such that 


a b| |a 3| |2+t8 a+9 
c d| “|2 ate c+d b |? 


1.4 Products of matrices 


The sum and the scalar multiplication of matrices were introduced in Section 
1.3. In this section, we introduce the product of matrices. Unlike the sum of 
two matrices, the product of matrices is a little bit more complicated, in the 
sense that it can be defined for two matrices of different sizes. The product 
of matrices will be defined in three steps: 


Step (1) Product of vectors: For a 1 xn row vector a = |a; a2 --+ an] 
and an n x 1 column vector x = |x1 £2 «++ £n]*, the product ax is a 1x1 
matrix (i.e., just a number) defined by the rule 


Tı 
T2 n 
ax = |a a2 -+> an] . = [azı + azt +--+ + antn] = X Qizi | . 
Š i=1 
Tn 


Note that the number of entries of the first row vector is equal to the number 
of entries of the second column vector, so that an entrywise multiplication 
is possible. 


Step (2) Product of a matrix and a vector: For an m x n matrix 


ai, Q12 ain ay 
a21 G22 a2n a2 
A = . = 
Aami Gm2 ‘° Gmn am 
with the row vectors a;’s and for ann x1 column vector x = [x1 --- £n]? , the 


product Ax is by definition an m x 1 matrix, whose m rows are computed 
according to Step (1): 


1.4. Products of matrices 19 


n 
Qil 412 +7: Gln Tı a,x viet QliTi 
n 
Q21 Q22 ° Gan T2 a2X Xia Q2iTi 
Ax = = = $ 
n 
Am1 m2 ‘** Amn Tn amX Xin amili 


Therefore, for a system of m linear equations in n unknowns, by writing the 
n unknowns as an n x 1 column matrix x and the coefficients as an m x n 
matrix A, the system may be expressed as a matrix equation Ax = b. 


Step (3) Product of matrices: Let A be an mxn matrix and B an 
nxr matrix with columns bı, bo, ..., bp, written as B = |b; bə --- b,]. 
The product AB is defined to be an m x r matrix whose r columns are the 
products of A and the r columns of B, each computed according to Step (2) 
in corresponding order. That is, 


aibı aib2 >- ajb, 

abı a2Þb9 pn azb, 
AB= [ Ab; Abo p Ab, ] = ’ 

abi a, be aes amb, 


which is an m x r matrix. Therefore, the (i, 7)-entry [AB];; of AB is: 


n 
[AB]i; = ajb; = aibij + ajgboj +--+ + ainbnj = X aikbkj. 
k=1 
This can be easily memorized as the sum of entrywise multiplications of the 
shaded vectors in Figure 1.3. 


a11 Q12 sts Gin 
bi «+> [Big] <*> bir 
bo, +++ (O2; b 
AB = ||an | a2 |: | Gin ot 3 2r 
i bni bnj bnr | 
ami Am2 ++) Amn 


Figure 1.3: The entry [AB],; 


Example 1.7 Consider the matrices 


2 3 
ca ls g 


II 
5 | 
o = 
| 

re N 
So: 
a=! 


20 Chapter 1. Linear Equations and Matrices 


The columns of AB are the product of A and each column of B: 
2 3 1 = 2°13. | |17 
4 0 5 i 4-1+0-5] | 4’ 
2 3 2 n 2.-2+3.(—1) |] ļ1 
4 0 —1 E 4-2+0-(-1)] 18? 


Eo E 


m~a 
A N 
Cow 
e 
i 
oo 
SE 
| 


Therefore, AB is 

zagi gai e i 

4 0 5 -1 0] |480’ 
Since A is a 2x2 matrix and B is a 2x3 matrix, the product AB is a 2x3 
matrix. If we concentrate, for example, on the (2, 1)-entry of AB, we single 


out the second row from A and the first column from B, and then we multiply 
corresponding entries together and add them up, 7.e.,4-1+0-5=4. 


Note that the product AB of A and B is not defined if the number of 
columns of A and the number of rows of B are not equal. 


Remark: In step (2), instead of defining a product of a matrix and a 
vector, one can define alternatively the product of a 1 x n row matrix x 
and an n x r matrix B using the same rule defined in step (1) to have a 
1 xr row matrix xB. Accordingly, in step (3) an appropriate modification 
produces the same definition of the product of two matrices. The readers 
are suggested to clarify it (see Example 1.10). 


The identity matrix of order n, denoted by J, (or I if the order is clear 
from the context), is a diagonal matrix whose diagonal entries are all 1, i.e., 


1 0 0 

a 0 1 
9 
0 0 1 


By a direct computation, one can easily see that AJ, = A = IA for any 
n x n matrix A. 

The operations of the scalar multiplication, the sum and the product of 
matrices satisfy many, but not all, of the same arithmetic rules that the real 


1.4. Products of matrices 21 


or complex numbers have. The matrix Omxn plays the role of the number 
0, and In plays that of the number 1 in the set of usual numbers. 

The rule that does not hold for matrices in general is the commutativity 
AB = BA of the product, while the commutativity of the matrix sum 
A+ B = B + A does always holds. The following example illustrates the 
non-commutativity of the product of matrices. 


Example 1.8 (Non-commutativity of the matrix product) 


1 0 0 1 
Let A=| j e and gl 5 | Then, 


which shows AB Æ BA. 


The following theorem lists the basic arithmetic rules that do hold in 
matrix product. 


Theorem 1.5 Let A, B, C be arbitrary matrices for which the matriz op- 
erations below can be defined, and let k be an arbitrary scalar. Then 


(1) A(BC) = (AB)C, (written as ABC) (Associativity), 
(2) A(B + C) = AB + AC, and (A+ B)C = AC + BC, (Distributivity), 
(3) IA=A=AI, 

(4) k(BC) = (kB)C = B(kC), 

(5) (AB)? = BTA’. 


Proof: Each equality can be shown by direct computations of each entry 
of both sides of the equalities. We illustrate this by proving (1) only, and 
leave the others to the readers. 

Assume that A = [aij] is an mxn matrix, B = [bpe] is an nxp matrix, 
and C = [cs] is a pxr matrix. We now compute the (i, j)-entry of each 
side of the equation. Note that BC is an nxr matrix whose (i,7)-entry is 
[BC] i; = er Di nCyj- Thus 


n 


[A(BC)]; = X aip [BC] = -XZ baeu = Sy ae uACAj- 


u=1 u=1 u=1 à=1 


22 Chapter 1. Linear Equations and Matrices 


Similarly, AB is an mxp matrix with the (7, 7)-entry [AB]i; = ie Ain Djs 
and 
Pp P n n p 
[(AB)C];; = S“[ABliancay = D bD QipbuACAj = Di S Gin bp 
A= 


1 A=) p=1 p=1 A=1 


This clearly shows that [A(BC)],; = [(AB)C];; for all i, j, and consequently 
A(BC) = (AB)C as desired. 


Problem 1.8 Give an example of matrices A and B such that (AB)? 4 A? BT. 


Problem 1.9 Prove or disprove: If A is not a zero matrix and AB = AC, then 
B = C. Similarly, is it true or not that AB = 0 implies A = 0 or B= 0? 


Problem 1.10 Show that any triangular matrix A satisfying AAT = ATA isa 
diagonal matrix. 


Problem 1.11 For a square matrix A, show that 
(1) AAT and A + A? are symmetric, 
(2) A—A” is skew-symmetric, and 
(3) A can be expressed as the sum of symmetric part B = $(A + A”) and 
skew-symmetric part C = $(A— AT), so that A= B + C. 


As an application of our results on matrix operations, one can prove the 
following important theorem: 


Theorem 1.6 Any system of linear equations has either no solution, exactly 
one solution, or infinitely many solutions. 


Proof: We have seen that a system of linear equations maybe written 
in matrix form as Ax = b. This system may have either no solution or a 
solution. If it has only one solution, then there is nothing to prove. Suppose 
that the system has more than one solution and let x; and xg be two different 
solutions so that Ax; = b and Axg = b. Let xp = x; — xg. Then xo Æ 0, 
and Axo = A(x; — x2) = 0. Thus 


A(x, + kxo) = Ax; + kAxo = b. 


This means that x; + kxo is also a solution of Ax = b for any k. Since there 
are infinitely many choices for k, Ax = b has infinitely many solutions. 


1.5. Block matrices 


23 


Problem 1.12 For which values of a does each of the following systems have no 
solution, exactly one solution, or infinitely many solutions? 


z + 2y 

(1) 4 3z - y 
4r + y 

t = y 

(2) x + 3y 
2x + ay 


1.5 Block matrices 


+++ ++ 


3z 

5z 

(a? — 14)z 
2 1 
az 2 
3z 3. 


a +2. 


In this section we introduce some techniques that may be helpful in manip- 
ulations of matrices. A submatrix of a matrix A is a matrix obtained from 
A by deleting certain rows and/or columns of A. Using some horizontal and 
vertical lines, one can partition a matrix A into submatrices, called blocks, 
of A as follows: Consider a matrix 


divided up into four blocks by the dotted lines shown. Now, if we write 


a14 
Q24 


Ax = | aza |, 


Ai = 


Agi = 


then A can be written as 


A= 


called a block matrix. 


| 


a11 
Q21 


a12 413 | a14 | 
a22 Q23 | a24 A 
a32 433 | a34 | 


413 
Q23 


li 


[ 431 432 433 ], 


The product of matrices partitioned into blocks also follows the matrix 


product formula, as if the blocks A;; were numbers: If 


lee 


Agi 


Alp 
A22 


| a B=] 


Bı Bız 
Bo, Bog 


24 Chapter 1. Linear Equations and Matrices 


are block matrices and the number of columns in A;; is equal to the number 
of rows in Bkj, then 


Ay By, ole Aj2 B21 Au Bı2 oe Aj2B22 


AB 
Ag, By, + Age Bo; A21 Big + A22 B22 


This will be true only if the columns of A are partitioned in the same way 
as the rows of B. 

It is not hard to see that the matrix product by blocks is correct. Sup- 
pose, for example, that we have a 3x3 matrix A and partition it as 


Q11 412 | 413 
= i Aj; Ai 
A= | an G22 | a23 | = ; 
— A21 A22 


a31 a32 | ass 
and a 3x2 matrix B which we partition as 
bii O12 
B 
B= | bza b2 = a 
— — Boi 
b31 b32 
Then the entries of C = [cij] = AB are 
Cig = (@i1b1y + ai2b2;) + aizbsj - 


The quantity ai1b1j + aj2b2; is simply the (i, 7)-entry of A1ı Bıı if i < 2, and 
is the (i, j)-entry of Ag, By, if i = 3. Similarly, aj3b3; is the (i, j)-entry of 
Ajo Bo, if i < 2, and of Ag2 Bo, if i = 3. Thus AB can be written as 


Cu | 2 | Aj, By, + A12 B21 


AB = . 
| C21 A21B11 + A22 B21 


Example 1.9 If an m x n matrix A is partitioned into blocks of column 
vectors: i.e., A = [cy C2 +: Cn], where each block c; is the j-th column, 
then the product Ax with x = [z1 -+> £n]! is the sum of the block matrices 
(or column vectors) with coefficients x;’s: 


T1 
T2 
Ax = |cı C2 °°: Cn] e = T1C1 + T2C2 +++: + TnCn, 
Tn 
fat ai, . i JT i ; = i 
where xj¢; = xj|@1; a2j ++ Gnj|’. Hence, a matrix equation Ax = b is 


nothing but a vector equation £1€C1 + £2€2 +-+- + EnEn = bd. 


1.6. Inverse matrices 25 


Example 1.10 Let A be an m x n matrix partitioned into the row vectors 
ai, a2, ..., an as its blocks, and let B be an n x r matrix so that their 
product AB is well-defined. By considering the matrix B as a block, the 
product AB can be written 


al, a,B a,b; a, be ers a,b, 
ag aB abı a2Þb9 ee asb, 
AB = B — — : 
am anB ambi ambo --- amb, 
where bi, b2, --- , b, denote the columns of B. Hence, the row vectors of 


AB are the products of the row vectors of A and the column vectors of B. 


Problem 1.13 Compute AB using block multiplication, where 


1 2ļ1 0 : nie 

Peal -3 4|0 1l, B= 
0. 0) 2 ot A E: 
Bu, Sie 


1.6 Inverse matrices 


As shown in Section 1.4, a system of linear equations can be written as 
Ax = b in matrix form. This form resembles one of the simplest linear 
equation in one variable az = b whose solution is simply « = a~'b when 
a #0. Thus it is tempting to write the solution of the system as x = AT tb. 
However, in the case of matrices we first have to have a meaning of A~!. To 
discuss this we begin with the following definition. 


Definition 1.11 For an m xn matrix A, an n x m matrix B is called a left 
inverse of A if BA = I, and an n x m matrix C is called a right inverse 
of A if AC = Im. 


A matrix A has a right inverse if and only if A’ has a left inverse, since 
(AB)? = BA? and I? = I. In general, a matrix with a left (right) inverse 
needs not have a right (left, respectively) inverse. 


Example 1.11 (One-side inverse) From a direct calculation for two matri- 
ces 


ae 2-1 


240.) 1 


26 Chapter 1. Linear Equations and Matrices 


—5 2 —4 
we have AB = In, and BA = 9 —2 6]#i. 
12 -4 9 


Thus, the matrix B is a right inverse but not a left inverse of A, while A is 
a left inverse but not a right inverse of B. 


The following lemma shows that if a matrix has both a left inverse and 
a right inverse, then they must be equal: 


Lemma 1.7 Ifan n xn square matriz A has a left inverse B and a right 
inverse C, then B and C are equal, i.e., B = C. 


Proof: A direct calculation shows that 


B = Bl, = B(AC) =(BA)C =1,C =C. 


Therefore, if a matrix A has both left and right inverses, then any two 
left inverses must be both equal to a right inverse C, and hence to each 
other, and any two right inverses must be both equal to a left inverse B, 
and hence to each other. So there exists only one left and only one right 
inverse, which must be equal by Lemma 1.7. 

We will show later (Theorem 1.9) that if A is a square matrix and has 
a left inverse, then it has also a right inverse, and vice versa. Moreover, 
Lemma 1.7 says that the left inverse and the right inverse must be equal. 
However, we shall also show in Chapter 3 that any non-square matrix A 
cannot have both a right inverse and a left inverse: that is, a non-square 
matriz may have only one-side inverse. The following example shows that, 
in this case, the matrix may have infinitely many one-side inverses. 


Example 1.12 (Infinitely many one-side inverses) A non-square matrix 


1 0 
A= | 0O 1 | can have more than one left inverses. In fact, for any z, 
0 0 


10r 
0 1 


y € R, the matrix B = | | is a left inverse of A. 


Definition 1.12 An n x n square matrix A is said to be invertible (or 
nonsingular) if there exists a square matrix B of the same size such that 
AB = h = BA. 


Such a matrix B is called the inverse of A, and is denoted by A7!. A 
matrix A is said to be singular if it is not invertible. 


1.6. Inverse matrices 27 


Lemma 1.7 implies that the inverse matrix of a square matrix is unique. 
That is why we call B ‘the’ inverse of A. For instance, consider a 2x2 matrix 


A= | Í | . If ad — be Æ 0, then it is easy to verify that 


d —b 

Ja 1 | d gi ad—be ad—bc 
ad—bc | —c a —c a 
ad—bc ad—bc 


since AAT! = Jy = AT!A. Note that any zero matrix is singular. 


Problem 1.14 Let A be an invertible matrix and k any nonzero scalar. Show that 
(1) A`! is invertible and (A~1)~1 = A; 
(2) the matrix kA is invertible and (kA)~! = ¢.A7?; 
(3) AT is invertible and (A‘)~! = (A71)?. 
Theorem 1.8 The product of invertible matrices is also invertible, whose 
inverse is the product of the individual inverses in reversed order: 


(AB)! = BHAT. 
Proof: Suppose that A and B are invertible matrices of the same size. Then 


(AB)(B-!A-!) = A(BB-1)A-! = AIAT! = AA-! = I, and similarly 
(B-!A7!)(AB) =I. Thus, AB has the inverse B~!.A7!. 


The inverse of A is written as ‘A to the power —1’, so one can give the 
meaning of A* for any integer k: Let A be a square matrix. Define A? = J. 
Then, for any positive integer k, we define the power A* of A inductively as 


AF = A(A*-}), 
Moreover, if A is invertible, then the negative integer power is defined as 
AE = (ATIE for k>0. 
It is easy to check that A*+’ = A* A’ whenever the right hand side is 
defined. (If A is not invertible, A?+(-)) is defined but AT! is not.) 


Problem 1.15 Prove: 
(1) If A has a zero row, so does AB. 
(2) If B has a zero column, so does AB. 
(3) Any matrix with a zero row or a zero column cannot be invertible. 


Problem 1.16 Let A be an invertible matrix. Is it true that (A*)" = (AT)! for 
any integer k? Justify your answer. 


28 Chapter 1. Linear Equations and Matrices 


1.7 Elementary matrices and computing A`! 


We now return to the system of linear equations Ax = b. If A has a right 
inverse B so that AB = Im, then x = Bb is a solution of the system since 


Ax = A(Bb) = (AB)b = b. 


(Compare with Problem 1.23). In particular, if A is an invertible square 
matrix, then it has only one inverse A~!, and x = A~'b is the only solution 
of the system. In this section, we discuss how to compute AT! when A is 
invertible. 

Recall that Gaussian elimination is a process in which the augmented 
matrix is transformed into its row-echelon form by a finite number of elemen- 
tary row operations. In the following, one can see that each elementary row 
operation can be expressed as a nonsingular matrix, called an elementary 
matriz, so that the process of Gaussian elimination is the same as multiply- 
ing a finite number of corresponding elementary matrices to the augmented 
matrix. 


Definition 1.13 An elementary matrix is a matrix obtained from the 
identity matrix I„ by executing only one elementary row operation. 


For example, the following matrices are three elementary matrices cor- 
responding to each type of the three elementary row operations: 


(E;) | : 5 | : —5 is multiplied to the second row of I2; 


—5 
1 00 0 
0 0 0 1 | : the second and the fourth rows of J, are inter- 
(E2) 
? 0 0 1 0 | changed; 
0100 
103 : ; 
: 3 tim ird row i rst row 
(E;)|0 1 0 3 times of the third row is added to the first row of 
“loo1|® 


It is an interesting fact that, if E is an elementary matrix, then for 
any m x n matrix A, the product EA is exactly the resulting matrix when 
the same kind elementary row operation that E has is executed on A. The 
following example illustrates this argument. (Note that AF is not what we 
want. For this, see Problem 1.18). 


Example 1.13 (Elementary operation by an elementary matriz) Let b = 
[b] b2 b3]! be a 3 x 1 column matrix . Suppose that we want to execute a 


1.7.. Elementary matrices and computing A`! 29 


elementary operation (E3) that ‘adding (—2) x the first row to the second 
row’ on the matrix b. First, we execute this operation on the identity matrix 
I; to get an elementary matrix E: 
1 0 0 
E=|-2 1 0 
0 0 1 


Multiplying this elementary matrix E to b on the left produces the desired 
result: 


1 0 0 by by 
Eb= | —2 1 0 bə | = | by — 2b) 
0 0 1 b3 b3 


Similarly, an elementary operation (E2) that ‘interchanging the first and 
the third rows’ on the matrix b can be achieved by multiplying an elementary 
matrix P obtained from J3 by interchanging the two rows, to b on the left: 


7 oO EI | bs | 
Pr oiled el 


Recall that each elementary row operation has an inverse operation, 
which is also an elementary operation, that brings the elementary matrix 
back to the original identity matrix. In other words, if Æ denotes an ele- 
mentary matrix and if E’ denotes the elementary matrix corresponding to 
the ‘inverse’ elementary row operation of E, then E’E = I, because 


(1) if E multiplies a row by c Æ 0, then E’ multiplies the same row by i; 

(2) if E interchanges two rows, then F’ interchanges them again; 

(3) if E adds a multiple of one row to another, then E’ subtracts it back 
from the same row. 


Furthermore, one can say that FE'E = I = EE’: every elementary matriz is 
invertible and inverse matrix Et = E' is also an elementary matriz. 


Example 1.14 (Inverse of an elementary matrix) If 


10 0 10 0 0 1 0 
Fix=|0 cc O|],H=|010],F=]1 0 0], 
001 3 0 1 001 
then 
1 0 0 10 0 0 1 0 
Be =) 0 1/e 0) |. Be S|. 0 1 0) | = | 8 0 
0 1 -3 0 1 001 


30 Chapter 1. Linear Equations and Matrices 


Definition 1.14 A permutation matrix is a square matrix obtained from 
the identity matrix by permuting the rows. 


In Example 1.14, Ez is a permutation matrix, but Fz is not. 


Problem 1.17 Prove: 

(1) A permutation matrix is the product of a finite number of elementary ma- 
trices each of which is corresponding to the ‘row-interchanging’ elementary 
row operation. 

(2) Every permutation matrix P is invertible and P7! = PT. 

(3) The product of any two permutation matrices is a permutation matrix. 
(4) The transpose of a permutation matrix is also a permutation matrix. 


Problem 1.18 Define the elementary column operations for a matrix by just 
replacing ‘row’ by ‘column’ in the definition of the elementary row operations. 
Show that if A is an m x n matrix and if E is a matrix obtained by executing an 
elementary column operation on In, then AF is exactly the matrix that is obtained 
from A when the same column operation is executed on A. 


The next theorem establishes some fundamental relations between nxn 
square matrices and systems of n linear equations in n unknowns. 


Theorem 1.9 Let A be ann x n matriz. The following are equivalent: 
(1) A has a left inverse; 
(2) Ax =0 has only the trivial solution x = 0; 
(3) A is row-equivalent to In ; 
(4) A is a product of elementary matrices; 
(5) A is invertible; 
(6) A has a right inverse. 


Proof: (1) = (2): Let x bea solution of the homogeneous system Ax = 0, 
and let B be a left inverse of A. Then 


x = lax = (BA)x = B(Ax) = BO=0. 


(2) = (3): Suppose that the homogeneous system Ax = 0 has only the 
trivial solution x = 0: 


LY = 0 


1.7. Elementary matrices and computing A`! 31 


This means that the augmented matrix [A 0] of the system Ax = 0 is 
reduced to the system |In 0] by Gauss-Jordan elimination. Hence, A is 
row-equivalent to In. 

(3) = (4): Assume A is row-equivalent to In, so that A can be reduced 
to I, by a finite sequence of elementary row operations. Thus, one can find 
elementary matrices F1, Fo,..., EH, such that 


Phe AST 


By multiplying Ezt, see £5 Ey’, Ey! successively to both sides of this equation 
on the left, we obtain 


AS hs Ea hy o Eù), 


which expresses A as the product of elementary matrices. 

(4) = (5) is trivial, because any elementary matrix is invertible. In fact, 
A7! = Eg- Eo Fi. 

(5) = (1) and (5) = (6) are trivial. 

(6) = (5): If B is a right inverse of A, then A is a left inverse of B and 
one can apply (1) = (2) = (3) = (4) = (5) to B and conclude that B is 
invertible, with A as its unique inverse, by Lemma 1.7. That is, B is the 
inverse of A and so A is invertible. 


If a triangular matrix A has a zero diagonal entry, then the system 
Ax = 0 has at least one free variable, so that it has infinitely many solutions. 
Hence, one can have the following corollary. 


Corollary 1.10 A triangular matrix is invertible if and only if it has no 
zero diagonal entry. 


Theorem 1.9 shows that a square matrix is invertible if it has a one-side 
inverse. In particular, if a square matrix A is invertible, then x = A~'b is 
a unique solution to the system Ax = b. 


Problem 1.19 Find the inverse of the product 


1 0 0 1 0 0 1 0 0 
0 1 0 0 1 0 —a 1 0}. 
0 -e 1 =p 0 1 0 0 1 


32 Chapter 1. Linear Equations and Matrices 


As an application of Theorem 1.9, one can find a practical method for 
finding the inverse AT! of an invertible nxn matrix A. If A is invert- 
ible, then A is row equivalent to I„ and so there are elementary matrices 
Fi, Fa, sca Ek such that Ek aag Aoki A = In. Hence, 


Al = Eg- -- FE oF, = Eg- -- E2 E1 Ín. 


This means that AT! can be obtained by performing on I, the same 
sequence of the elementary row operations that reduces A to In. Practi- 
cally, one first construct an n x 2n augmented matrix [A | In] and then 
perform a Gaussian-Jordan elimination that reduces A to I, on [A | In] to 
get |In | A7!]: that is, 

[A| In] > [E BA | Ee- Fil] = |U | K] 
> [Fk RU | Fp FiK] = [I | A7*, 


where Ey--- FE, represents a Gaussian elimination that reduces A to a row- 
echelon form U and F;, --- F; represents the back substitution. The following 
example illustrates the computation of an inverse matrix. 


Example 1.15 (Computing A~') Find the inverse of 
1 2 3 
A=|2 3 5 
1 0 2 


Solution: Apply Gauss-Jordan elimination to 


1 23|100 
[A| J] = 2 Be |) OL 70 (—2)row 1 + row 2 

102 |00 1 (—1)row 1 + row 3 
E 1 0 o 

> |0 -1 -1 | -2 1 0l (-tyrow 2 
Panen i e i 
1 2 3 1 0 A 

> 0 1 1 2 -1 0 
0 2 -1 104 | (2)row 2 + row 3 
1 23|1 00 

s We tT 
0 0 1 | 3 -2 1 


1.7. Elementary matrices and computing A`! 33 


This is [U | K] obtained by Gaussian elimination. Now continue the back 


substitution to reduce |U | K] to [7 | A 
1 2 1 0 0 
—l)row 3+ row 2 
UIK] = OT Ae ae -10 eee, 
0 0 1 E | 
1 2 0 -8 6 -3 
= 010 sio i = (—2)row 2+ row 1 
001) 3-2 1] 
1 0 0 Ge. 4 -1 
0 0 1 ee 1 
Thus, we get 
—6 4 -l 
Ate =f 1 a] 
3 —2 1 


(The reader should verify that AAT! = I = AT! A.) 


Note that if A is not invertible, then, at some step in Gaussian elimina- 
tion, a zero row will show up on the left side in [U | K]. For example, the 


16 4 | 1 6 4 | 
matrix A = 2 4 -1 |; is row-equivalent to | 0 —8 —9 | which is 
a 2 | 0 0 0 | 


a non-invertible matrix. 


From Theorem 1.9, a square matrix A is invertible if and only if Ax = 0 
has only the trivial solution. That is, a square matrix A is non-invertible if 
and only if Ax = 0 has a non-trivial solution, say xg. Now, for any column 
vector b = [bı --- b,]", if x; is a solution of Ax = b for a non-invertible 
matrix A, then so is kxyg + x; for any k, since 


A(kxo F xı) = k( Axo) = Ax] =k0+b=b. 


This argument strengthens Theorem 1.6 as follows when A is a square 
matrix: 


Theorem 1.11 Jf A is an invertible n x n matriz, then for any column 
vector b = [by -= bnl”, the system Ax = b has exactly one solution 
x = Atb. If A is not invertible, then the system has either no solution or 
infinitely many solutions according to the consistency of the system. 


34 Chapter 1. Linear Equations and Matrices 


Problem 1.20 Write AT! as a product of elementary matrices for A in Exam- 
ple 1.15. 


dy 0 


Problem 1.21 When is a diagonal matrix D = oe nonsingular, and 


what is D71? 


Problem 1.22 Write the system of linear equations 


x + 2 + 2z = 10 
2a — 4d + 382 = 1 
4x — 3y + 5z = 4 


in matrix form Ax = b and solve it by finding A~'b. 


Problem 1.23 True or false: If the matrix A has a left inverse C so that CA = In, 
then x = Cb is a solution of the system Ax = b. Justify your answer. 


Remark: The usual number of operations required to compute AT! by 
. Bua te i x 3 3 
using Gauss-Jordan elimination on [A J] is 4: 4 steps for A and n? steps 


for each column of J which has n columns. 
However, each column of J is ej, which means that at the j—th place 
in forward elimination there are only n — j components to be changed so 


;\2 
that (nd ) operations are needed. Summing over all j, the total forward 
eliminations is r: The back-substitutions for n columns of the right side of 


A take n operations. Together with the ne operations acting on A, the 
total number of operations needed for AT! is 


3 3 3 
n n n 3 


= Sa 
6 3 2 
This count is remarkably low, since even a matrix multiplication already 
requires n° steps. Nevertheless, it is not recommended to compute A~! if it 


is not necessary. 


1.8 LDU decomposition 


In this section, we show that the forward elimination for solving a system of 
linear equations Ax = b can be expressed by some invertible lower triangular 
matrix, so that the matrix A can be factorized as a product of two or more 
matrices. 


1.8. LDU decomposition 35 


We first assume that no permutations of rows (Es) are necessary through- 
out the whole process of forward elimination on the augmented matrix [A b]. 
Then the forward elimination is just multiplications of finitely many elemen- 
tary matrices Ex, ..., Eı of the type E; to the augmented matrix [A b]: 
that is, 

[Ex ELA Ex: +: Eb] = [U yl, 


where each E; is an elementary matrix of the type E which is a lower 
triangular matrix whose diagonal entries are all 1’s, and U is a row-echelon 
form of A without divisions of the rows by the pivots. (Note that if A is a 
square matrix, then U must be an upper triangular matrix). Therefore, if 
we set L = (Ep +- E,)~' = Ej'--- Ez", then we have A = LU, where L is a 
lower triangular matrix whose diagonal entries are all 1’s. (In fact, each Be 
is also a lower triangular matrix, and a product of lower triangular matrices 
is also lower triangular. See Problem 1.25). Such factorization A = LU is 
called an LU decomposition or an LU factorization of A. For example, 


1 0 0 0 dj x kx * * x 
en ae a 0 0 dg * * x| | 
oS xx 1 0 0 00 0d x| Bu, 
x x * 1 0 00 00 0 


where d;’s are the pivots. 
Now, let A = LU be an LU decomposition. Then the system Ax = b 
can be written as LUx = b. Let Ux = y. Thus, the system 


Ax = LUx=b 


can be solved by the following two steps: 


Step 1: Solve Ly = b for y. 
Step 2: Solve Ux = y for x by back substitution. 


The following example illustrates a convenience of an LU decomposition 
of a matrix A for solving the system Ax = b. 


Example 1.16 Solve the system of linear equations 


211 0) nl 1] 
Ax=| 4101 r2 esl aa Nay 
22111 a 7 | 


by using an LU decomposition of A. 


36 Chapter 1. Linear Equations and Matrices 


Solution: The elementary matrices for the forward elimination on the aug- 
mented matrix [A b] are easily found to be 


1 0 0 1 0 0 1 0 0 
E= | -2 1 0], &=]0 1 0 | and F3=)}]0 1 0j, 
0 0 1 1 0 1 03 1 
so that 9 1 10 
EzE6&2 BF A= | 0 -1 —2 1 | =U. 
0 —4 4 


Thus, if we set 


1 00 
L= E B =] 2 EO he 
-1 -3 1 


which is a lower triangular matrix with 1’s on the diagonal, then 


1 0 0 2 1 1 0 
A= LU = 2 1 0 0 -1 -—2 1 
—1 -3 1 0 0 —4 4 
Now, the system 
yı = 1 
Ly=b: 2y + Yo = -2 
-y — 3y + y = 7 


can be easily solved inductively to get y = (1, —4, —4) and the system 


2%, + 42 + T3 = 1 
Ux=y: — £2 — 243 + z4 = —4 
= 4x3 oe 4n4 = —4 
also can be solved by back substitution to get 
Ly —l —1 
z2 | 2 to 2 —1 
RSM E a E A E Is 
T4 t 0 1 


for t € R, which is the solutions for the original system. 


1.8. LDU decomposition 37 


As shown in Example 1.16, it is a simple computation to solve the sys- 
tems Ly = b and Ux = y, because the matrix L is lower triangular and the 
matrix U is the matrix obtained from A after forward elimination. 


Remark: For a system Ax = b, the Gauss elimination is saved as an LU 
decomposition of a matrix A. This decomposition is quite useful when one 
needs to solve several systems of type Ax = b; for i = 1, 2, ..., Z, since the 
Gauss elimination is determined by A only, not by b;’s. Thus once the Gauss 
elimination is executed, using an LU decomposition of A the repetition the 
Gauss elimination may be avoided like; solve first Ly = b; for i = 1,2,...,£, 
and then the solutions of Ax = b; are just those of Ux = y;. 


Problem 1.24 Determine an LU decomposition of the matrix 


1 -l 0 
REN ot 29: hy 
0 -1 2 
from which solve Ax = b for (1) b = [1 1 1]" and (2) b = [20 — 1)". (3) Find the 
solutions of Ax = e; fori = 1, ..., n, to get ATH. 


Problem 1.25 Let A and B be two lower triangular matrices. Prove that 
(1) their product AB is also a lower triangular matrix; 
(2) if A is invertible, then its inverse is also a lower triangular matrix; 
(3) if the diagonal entries of A and B are all 1’s, then the same holds for their 
product AB and their inverses. 
Note that the same holds for upper triangular matrices, and for the product of 
more than two matrices. 


The matrix U in the decomposition A = LU of A can further be factor- 
ized as the product U = DU, where D is a diagonal matrix whose diagonal 
entries are the pivots of U or zeros and U is a row-echelon form of A with 
leading 1’s, so that A = LDU. For example, 


1000 dj * * * * x 
= {| * 10 0 0 0 d * * x| | 
BS Meee ee a E Ged aac Mh 
x x * 1l 0 00 00 0 
100 0 d 0 0 0 lýg OE 
= re ee ee a 0 d 0 0 00 1i% oe cae 
= xx» 1 0 0 0 d 0 00 00 1 ae 
x x x 1l 0 0 0 0 00 00 00 


38 Chapter 1. Linear Equations and Matrices 


For notational convention, we replace U again by U and write A = LDU. 
This decomposition of A is called an LDU decomposition or an LDU 
factorization of A. 

For example, the matrix A in Example 1.16 was factorized as 


1 00 2: al 1 0 
A= 2 1 0 0 -1 -2 1) =U. 
—] -3 1 0 0 -4 4 


It can be further factorized as A = LDU by taking 


2 1 1 0 2 0 0 1 1/2 1/2 0 
0 -1 -2 1|j=]|0 -1 0 0 1 2 -l1 | = DU. 
0 0 -4 4 0 —4 0 0 1 -l 


The LDU decomposition of a matrix A is always possible when no row 
interchange is needed in the Gaussian elimination process. In general, if 
a permutation matrix for a row interchange is necessary in the Gaussian 
elimination process, then an LDU decomposition may not be possible. For 
0 1 
2 0 
interchanging is necessary to get a row-echelon form of A. In fact, one can 
show that it cannot be expressed as a product of any lower triangular matrix 
L and any upper triangular matrix U. 


example, the matrix A = | | has no LU decomposition, since a row 


Suppose now that a row interchange is necessary during the forward 
elimination on the augmented matrix [A b]. In this case, one can first do all 
the row interchanges before doing any other type of elementary row opera- 
tions, since the interchange of rows can be done at any time, before or after 
the other elementary operations, with the same effect on the solution. Those 
‘row-interchanging’ elementary matrices altogether form a permutation ma- 
trix P so that no more row interchanges are needed during the forward 
elimination on PA. Now, the matrix PA can have an LDU factorization. 


0 1 2 
Example 1.17 Consider a square matrix A= | 0 1 0 |. For forward 
1 0 2 
elimination, it is clearly necessary to interchange the first row with the third 
0 0 1 
row, that is, we need to multiply the permutation matrix P= | 0 1 0 
1 0 0 


1.8. LDU decomposition 39 


to A so that 


10 2] 10 0} 71 0 0771 0 2) 
PA=|010]/=/010]]/010}]01 0| =2LDU. 
012] ea aloca kooni 


Note that U is a row-echelon form of the matrix A. 


Of course, if we choose a different permutation P’, then the LDU decom- 
position of P’A may be different from that of PA, even if there is another 
permutation matrix P” that changes P’A to PA. Moreover, as the follow- 
ing example shows, even if a permutation matrix is not necessary in the 
Gaussian elimination, the LDU decomposition of A needs not be unique. 


Example 1.18 The matrix 
1 1 0 
B=|1 3 0 
0 0 0 
has the LDU factorization 


1 0 0 1 0 0 
B= {11 0 02 0 = LDU 
0 0 1 0 0 « 


Oro 
Orr 
Or So ae 


for any value x. It shows that a singular matrix B has infinitely many LDU 
decompositions. 


However, if the matrix is an invertible square matrix and if the permuta- 
tion matrix P is fixed when it is necessary, then the matrix PA has unique 
LDU decomposition. 


Theorem 1.12 Let A be an invertible matriz. Then for a fixed suitable 
permutation matriz P the matrix PA has unique LDU decomposition. 


Proof: Suppose that PA = £,D,U, = L2D2U2, where the L’s are lower 
triangular, the U’s are upper triangular whose diagonals are all 1’s, and the 
D’s are diagonal matrices with no zeros on the diagonal. One needs to show 
Lı = Lə, Dy = Dg, and U, = U2 for the uniqueness. 

Note that the inverse of a lower (upper) triangular matrix is also a lower 
(upper) triangular matrix. And the inverse of a diagonal matrix is also 


40 Chapter 1. Linear Equations and Matrices 


diagonal. Therefore, by multiplying (Lı Dı)! = Dy'L7' on the left and 
U3" on the right, the equation Lı DıUı = Lə DU becomes 


UU; ' = Dy'Ly'L2D2. 


The left side is an upper triangular matrix, while the right side is a lower 
triangular matrix. Hence, both sides must be diagonal. However, since the 
diagonal entries of the upper triangular matrix U,;U, l are all 1’s, it must be 
the identity matrix I (see Problem 1.25). Thus U,U,' = I, i.e., Ui = U2. 
Similarly, L['L2 = D,Dz' implies that Lı = Ly and Dı = Dy. 


In particular, if an invertible matrix A is symmetric (i.e., A = A’), and 
if it can be factorized into A = LDU without row interchanges, then we 
have 

LDU = A = A! = (LDU)! aU DE aw De, 


and thus, by the uniqueness of the decomposition, we have U = L? and 
A=LDL'. 


2>.=1. 1.0 
Problem 1.26 Find the factors L,D, and U for A= | —1 2 -1 
0 -1 2 


What is the solution to Ax = b for b = [10 — 1]? 


Problem 1.27 For all possible permutation matrices P, find the LDU decompo- 
1 2 3 

sition of PA for A= | 2 4 2 
1 1 1 


Remark: If one notices that multiplying the transpose ET of an elementary 
matrix E to the right side of a matrix A produces the same elimination as 
E on the columns of A, the above result can easily be extended to general 
symmetric square matrices even to non-invertible ones. 

Suppose that A is a symmetric square matrix and can be factorized into 
A = LDU without row interchanges. Then one can easily see that the Gauss 
elimination L~! to A in both rows and columns is written as (L~!)A(L7!)’, 
x, 0 
0 0 
the diagonal matrix of order r of nonzero pivots. In this way we can obtain 


and that the result will be a diagonal matrix D = | | where i, is 


S EON e Xr 0 T 
A=LDL =1| 0 0 LT. 


1.9. Applications 41 


Example 1.19 For the symmetric matrix 


1 1 v2 ] 
A=| 1 4 W2), 
VB Aav 8 |] 


by a direct computation, one can get the following LDL’ factorization: 
1 0 0 | 10 0 | 1 1 v2 | 
1 1 0 0 3 0 01 /2| =LDL’. 

0 0 0 | 0 0 | 


A= 
V2 V2 1 | 1 


1.9 Applications 


1.9.1 Cryptography 


Cryptography is the study of sending messages in disguised form (secret 
codes) so that only the intended recipients can remove the disguise and read 
the message; modern cryptography uses advanced mathematics. As another 
application of invertible matrices, we introduce a simple coding. Suppose we 
associate a prescribed number with every letter in the alphabet; for example, 


ÄB O Dos xX Y Z Blank ? ! 
yo Ae eee dt ew dl Bod 
0 1 2 3 ++ 23 24 25 26 27 28. 


Suppose that we want to send the message “GOOD LUCK”. Replace 
this message by 
6, 14, 14, 3, 26, 11, 20, 2, 10 


according to the preceding substitution scheme. A code of this type could be 
cracked without difficulty by a number of techniques of statistical methods, 
like the analysis of frequency of letters. To make it difficult to crack the code, 
we first break the message into three vectors in R? each with 3 components, 
by adding extra blanks if necessary: 


6 3 20 
14], | 26], 2 
14 11 10 


42 Chapter 1. Linear Equations and Matrices 
Next, choose a nonsingular 3 x 3 matrix A, say 
1 0 0 
A=/]}2 1 0], 
1 1 1 


which is supposed to be known to both sender and receiver. Then as a linear 
transformation A translates our message into 


6 6 3 3 20 20 
Alia l=] 26], A|26|=] 32], A| 2ļ|=]ļ142 
14 34 11 40 10 32 


By putting the components of the resulting vectors consecutively, we trans- 
mit 


6, 26, 34, 3, 32, 40, 20, 42, 32. 


To decode this message, the receiver may follow the following process. 
Suppose that we received the following reply from our correspondent: 


19, 45, 26, 13, 36, 41. 
To decode it, first break the message into two vectors in R? as before: 
19 | 13 | 
45 |, 36 |. 
26 | Al | 


We want to find two vectors x1, x2 such that Ax; is the i-th vector of the 
above two vectors: i.e., 


19 13 
Ax,;=| 45 |, Ax. =| 36 
26 41 


Since A is invertible, the vectors X1, X2 can be found by multiplying the in- 
verse of A to the two vectors given in the message. By an easy computation, 
one can find 


1 00 
A =| -2 10 
1 -1 1 
Therefore, 
1 00 19 19 13 
xi=]—-2 1 O 45 | = 7T |, x2= | 10 
1 -1 1 26 0 18 


1.9. Application: Linear models 43 


The numbers one obtains are 
19, 7, 0, 13, 10, 18. 


Using our correspondence between letters and numbers, the message we have 
received is “THANKS”. 


Problem 1.28 Encode “TAKE UFO ” using the same matrix A used in the above 
example. 


1.9.2 Electrical network 


In an electrical network, a simple current flow may be illustrated by a dia- 
gram like the one below. Such a network involves only voltage sources, like 
batteries, and resistors, like bulbs, motors, or refrigerators. The voltage is 
measured in volts, the resistance in ohms, and the current flow in amperes 
(amps, in short). For such an electrical network, current flow is governed by 
the following three laws: 


e Ohm’s Law: The voltage drop V across a resistor is the product of 
the current J and the resistance R: V = IR. 


e Kirchhoff’s Current Law (KCL): The current flow into a node 
equals the current flow out of the node. 


e Kirchhoff’s Voltage Law (KVL): The algebraic sum of the voltage 
drops around a closed loop equals the total voltage sources in the loop. 
Example 1.20 Determine the currents in the network given in Figure 1.4. 
Solution: By applying KCL to nodes P and Q, we get equations 
L + Iz = Ip at P, 
Ip = h + Tz at Q. 
Observe that both equations are the same, and one of them is redundant. 


By applying KVL to each of the loops in the network clockwise direction, 
we get 


6l +2I, = 0 from the left loop, 
215 + 313 


18 from the right loop. 


44 Chapter 1. Linear Equations and Matrices 


2 ohms P 2 ohms 
qi, Iz 
solitis 18 volts 
1 ohm Q 1 ohm 


Figure 1.4: A circuit network 


Collecting all the equations, we get a system of linear equations: 


L- k + & = 0 
6l + 215 = 0 
212 + 3b = 18. 
By solving it, the currents are J, = —1 amp, I = 3 amps and Iz = 4 amps. 


The negative sign for J; means that the current J, flows in the direction 
opposite to that shown in the figure. 


Problem 1.29 Determine the currents in the networks given in Figure 1.5. 


40 ohms 20 volts I 2 votls 1 ohm 


I, 
30 ohms D 


I3 


40 ohms 40 volts 4 votls 
(1) (2) 


Figure 1.5: Two circuit networks 


1.9. Application: Linear models 45 


1.9.3 Leontief Model 


Another significant application of linear algebra is to a mathematical model 
in economics. In most nations, an economic society may be divided into 
many sectors that produce goods or services, such as the automobile indus- 
try, oil industry, steel industry, communication industry, and so on. Then a 
fundamental problem in economics is to find the equilibrium of the supply 
and the demand in the economy. 

There are two kind of demands for the goods: the intermediate demand 
from the industries themselves (or the sectors) that are needed as inputs for 
their own production, and the extra demand from the consumer, the gov- 
ernmental use, surplus production, or exports. Practically, the interrelation 
between the sectors is very complicated, and the connection between the 
extra demand and the production is unclear. A natural question is whether 
there is a production level such that the total amounts produced (or supply) 
will exactly balance the total demand for the production, so that the equality 


{Total output} = {Total demand} 
= {Intermediate demand} + {Extra demand} 


holds. This problem can be described by a system of linear equations, which 
is called the Leontief Input-Output Model. To illustrate this, we show a 
simple example. 

Suppose that a nation’s economy consists of three sectors: J} = automo- 
bile industry, [2 = steel industry, and J3 = oil industry. 

Let x = [x1 x2 £3]! denote the production vector (or production level) in 
R?, where each entry x; denotes the total amount (in a common unit such 
as ‘dollars’ rather than quantities such as ‘tons’ or ‘gallons’) of the output 
that the industry J; produces per year. 

The intermediate demand may be explained as follows. Suppose that, 
for the total output x2 units of the steel industry Iz, 20% is contributed by 
the output of J71, 40% by that of Ig and 20% by that of I3. Then we can 
write this as a column vector, called a unit consumption vector of Is: 


0.2 
C2 = 0.4 
0.2 


For example, if I> decides to produce 100 units per year, then it will order (or 
demand) 20 units from J, 40 units from Jz, and 20 units from Iz: i.e., the 
consumption vector of I> for the production z2 = 100 units can be written as 


46 Chapter 1. Linear Equations and Matrices 


a column vector: 100c, = [20 40 20]’. From the concept of the consumption 
vector, it is clear that the sum of decimal fractions in the column cz must 
be <1. 

In our example, suppose that the demands (inputs) of the outputs are 
given by the following matrix, called an input-output matriz: 


output 
h In Iz 
L| 0.3 0.2 0.3 
A= input b| 0.1 0.4 0.1 
B| 0.3 0.2 0.3 
T tT tT 


C1 C2 C3 


In this matrix, an industry looks down a column to see how much it needs 
from where to produce its total output, and it looks across a row to see how 
much of its output goes to where. For example, the second row says that, 
out of the total output x2 units of the steel industry I2, as the intermediate 
demand, the automobile industry J} demands 10% of the output x , the steel 
industry Jo demands 40% of the output x2 and the oil industry 73 demands 
10% of the output x3. Therefore, it is now easy to see that the intermediate 
demand of the economy can be written as 


0.3 0.2 0.3 zı 0.32, + 0.229 + 0.323 
Ax= | 0.1 04 0.1 zo | = | 0.1lzı + 0.4z2 + 0.123 
0.3 0.2 0.3 T3 0.32, + 0.229 + 0.323 


Suppose that the extra demand in our example is given by d = |d; , d2,d3|4 = 
(30, 20,10)’. Then the problem for this economy is to find the production 
vector x satisfying the following equation: 


x = Ax4d. 


Another form of the equation is (I — A)x = d, where the matrix I — A 
is called the Leontief matriz. If I — A is not invertible, then the equation 
may have no solution or infinitely many solutions depending on what d is. If 
I— Ais invertible, then the equation has the unique solution x = (I—A)~!d. 
Now, our example can be written as 


Ly 0.3 0.2 0.3 ti 30 
z2 | =] 01 0.4 0.1 xq | + | 20 
T3 0.3 0.2 0.3 T3 10 


1.9. Application: Linear models AT 


In this example, it turns out that the matrix J — A is invertible and 


2.0 1.0 1.0 
(T-A)! S105 2.0 05 
1.0 1.0 2.0 
Therefore, 
90 
x =(I—A)"'d= | 60 |, 
70 


which gives the total amount of product x; of the industry J; for one year 
to meet the required demand. 


Remark: (1) Under the usual circumstances, the sum of the entries in a 
column of the consumption matrix A is less than one because a sector should 
require less than one units worth of inputs to produce one unit of output. 
This actually implies that J — A is invertible and the production vector x is 
feasible in the sense that the entries in x are all nonnegative as the following 
argument shows. 

(2) In general, by using induction one can easily verify that for any 
k=1,2,..., 

(I-A)\(I+A+---+A*?*) = I Af. 

If the sums of column entries of A are all strictly less than one, then 


limk; A* = 0 (see Section 6.4 for the limit of a sequence of matrices). 
Thus, we get (I — A)(I+A+---+ Af +...) =T, that is, 


(T-A! =I+A+ ARH... 


This also shows a practical way of computing (J — A)~! since by taking k 
sufficiently large the right side may be made very close to (I — A)~!. In 
Chapter 6, an easier method of computing A’ will be shown. 

In summary, if A and d have nonnegative entries and if the sum of the 
entries of each column of A is less than one, then J — A is invertible and the 
inverse is given as the above formula. Moreover, as the formula shows the 
entries of the inverse are all nonnegative, and so are those of the production 
vector x = (I — A)~!d. 


Problem 1.30 Determine the total demand for industries Jı, I2 and J3 for the 
input-output matrix A and the extra demand vector d given below: 


0.1 0.7 0.2 
A=|05 0.1 06 | withd=0. 


0.4 0.2 0.2 


48 Chapter 1. Linear Equations and Matrices 


Problem 1.31 Suppose that an economy is divided into three sectors: I, = ser- 
vices, I2 = manufacturing industries, and I3 = agriculture. For each unit of output, 
J, demands no services from J,, 0.4 units from Jy, and 0.5 units from J3. For each 
unit of output, J> requires 0.1 units from sector J, of services, 0.7 units from other 
parts in sector Jz, and no product from sector I3. For each unit of output, Is 
demands 0.8 units of services J4, 0.1 units of manufacturing products from Iz, and 
0.1 units of its own output from J3. Determine the production level to balance 
the economy when 90 units of services, 10 units of manufacturing, and 30 units of 
agriculture are required as the extra demand. 


1.10 Exercises 


1.1. Which of the following matrices are in row-echelon form or in reduced row- 
echelon form? 


[1 0 00 -3 aoe 
A=j;0 0 1 0 4 |, B= ; 
|o 001 2 0 0 0 13 
00000 
ee 0100 5 
C= D=|0 0 1 1 -4 
0 0 0 1 0? 0001 3 : 
0 0 0 0 0 
TE DA. E 
E=|0 0 1 0 4 F= 
010 2 3 : 0 0 0 1 —i 
0 0 0 0 0 
1.2. Find a row-echelon form of each matrix. 
1-3 2412 12 3 4 5 
i —9 10 2 9 gone od 
(1) : (2) 3 4 5 1 2 
2 -6 4 2 4 
2 6 817 4 5 12 3 
5 12 3 4 


1.3. Find the reduced row-echelon form of the matrices in Exercise 1.2. 


1.4. Solve the systems of equations by Gauss-Jordan elimination. 


zı + zə + z3 — z4 = —2 
(1) 2%, — @ + @ + t4 = 0 
3a, + 2% — z3 — 4. = 1 

zı + tw + 3x3 — 344 = -8. 


1.10. 


1.5. 


1.6. 


1.7. 


1.8. 


1.9. 


1 
. Let A= | 0 
0 


Exercises 49 
2x — 3y = 8 
(2)4 4r -— 5y + z = 15 
2r + 4z = 1. 


What are the pivots in each elimination step? 


Which of the following systems has a nontrivial solution? 


z + 2 + 3z = 0 22 + y -— gz = 0 
(1) 2y + 22 = 0 (2) x — Wy - 3z = 0 
r + Wy + 32 = 0 3a + y -— 22 = 0. 
Determine all values of the b; that make the following system consistent: 
t + y -~ 2 = by 
2y + z = b 
y = z = b. 


Determine the condition on b; so that the following system has no solution: 


6x — 2y + lliz = b 
2r — y + 382 = bz. 


Let A and B be matrices of the same size. 
(1) Show that, if Ax = 0 for all x, then A is the zero matrix. 
(2) Show that, if Ax = Bx for all x, then A = B. 


Compute ABC and CAB, for 


3 
2 -11 
oF f 2 a dz E , C=[|1 -1]. 


. Prove that if A is a 3 x 3 matrix such that AB = BA for every 3 x 3 matrix 


B, then A = cl3 for some constant c. 


2 0 
1 3 |. Find A* for all integers k. 
0 1 


. Compute (2A — B)C and COT for 


10 0 100 2 1 1 
ĘA=]|0 10], B=|-210], C= 1 0 

1 0 1 001 -2 2 1 
f 


. Find the symmetric part and the skew-symmetric part of each of the following 


matrices. 


50 


Chapter 1. Linear Equations and Matrices 


. Find AA? and A’ A for the matrix A = 


N Ne 
A WO 
O.N 


. Let AT! = 


i Ohm 
oe eH 
ew hd 


1 2 
(1) Find a matrix B such that AB = | 0 1 
1 


(2) Find a matrix C such that AC = A? + A. 


. Find all possible choices of a,b and c so that A = | : ; | has an inverse 


matrix such that A7! = A. 


. Decide whether or not each of the following matrices is invertible. Find the 
inverses for invertible ones. 
> 1 1 1 1 2 -1 
A= », B=]0 2 37, CSE 2 38 
De ee 55 1 2 2 1 
00 0 4 
. Find the inverse of each of the following matrices: 
1-1 2 1200 1k 00 
Pee Oe eee TDA A E ale We aco. | eee 
1 2 4 8 0 0 1k 


. Suppose A is a 2 x 1 matrix and B is a 1 x 2 matrix. Prove that the product 
AB is not invertible. 


2 -1 3 4 
. Find three matrices which are row equivalent to A = | 0 1 2 -1 
5 2 -3 4 


. Write the following systems of equations as matrix equations Ax = b and 
solve them by computing A~'b: 
2%, — x42 + 3273 = 2 ti = tT + z3 = 5 
(1) t — 4973 = 5 (2) aq + a - «#3 = -l 
2%, + @ — 223 = 7, 4zı — 322 + 2273 = —3. 


. Find the LDU factorization for each of the following matrices: 


m S 


1.10. 


1.23. 


1.24. 


1.25. 


1.26. 


1.27. 


1.28. 


öl 


Exercises 


Find the LDL" factorization of the following symmetric matrices: 


1 2 3 b 
()A=|2 6 8], a=] 5.5): 
3 8 10 
Solve Ax = b with A = LU, where L and U are given as 
| 1 0 0 | 1 -l 0 | 2 
L= | -1 1 0|, U=] 0 1 -1 |, b=j -3 
0 -1 1 | 0 oO 1 4 


Forward elimination is the same as Le = b, and back-substitution is Ux = c. 


pr. E 
Le A=|1 4 5 | and b=| 3 
[iar | 5 


(1) Solve Ax = b by Gauss-Jordan elimination. 
(2) Find the LDU factorization of A. 
(3) Write A as a product of elementary matrices. 
(4) Find the inverse of A. 
A square matrix A is said to be nilpotent if A* = 0 for a positive integer k. 
(1) Show that an invertible matrix is not nilpotent. 
(2) Show that any triangular matrix with zero diagonal is nilpotent. 
(3) Show that if A is a nilpotent with A* = 0, then J — A is invertible with 


its inverse I + A +- + AFTE. 


A square matrix A is said to be idempotent if A? = A. 
(3 Find an example of an idempotent matrix other than 0 or J. 

2) Show that, if a matrix A is both idempotent and invertible, then A = I. 
Determine whether the following statements are true or false, in general, and 
justify your answers. 

Let A and B be row-equivalent square matrices. Then A is invertible 
if and only if B is invertible. 

2) Let A be a square matrix such that AA = A. Then A is the identity. 
If A and B are invertible matrices such that A? = I and B? = I, then 
(AB)! = BA. 

If A and B are invertible matrices, A + B is also invertible. 

If A, B and AB are symmetric, then AB = BA. 

If A and B are symmetric and the same size, then AB is also symmetric. 
Let ABT = I. Then A is invertible if and only if B is invertible. 

If a square matrix A is not invertible, then neither is AB for any B. 
If E, and Ey, are elementary matrices, then Ey Eo = Ey F. 

The inverse of an invertible upper triangular matrix is upper triangular. 
Any invertible matrix A can be written as A = LU, where L is lower 
triangular and U is upper triangular. 

If A is invertible and symmetric, then AT! is also symmetric. 


A 


m O O OND Ot 
Na a e Ma NN 


~ 
pi TE p ee a D 


— 


~ 
= 
N 
S 


Chapter 2 


Vector Spaces 


2.1 The n-space R” and vector spaces 


We have seen that Gaussian-elimination, which is the most basic technics in 
solving a system Ax = b of linear equations, could be written in the matrix 
notation, and with the basic matrix theory the questions of the existence or 
the uniqueness of the solution were much easier to answer. In fact, the set 
of all the solutions has some kind of mathematical structure, called a vector 
space, and with this concept one can characterize the solvability of a system 
of linear equations in more systematic way. 

In this chapter, we introduce the notion of a vector space, which is an 
abstraction of the usual algebraic structures of the 3-space R? and then 
elevate our study of a system of linear equations to this framework. 

Usually, many physical quantities, such as length, area, mass, tempera- 
ture are described by real numbers as magnitudes. Other physical quantities 
like force or velocity have directions as well as magnitudes. Such a quantity 
with a direction and a magnitude is pictorially represented by an arrow. For 
instance, the direction of an arrow in the 3-space R? is usually drown from 
the origin O to a point x with its tail at the origin and its head at x. In this 
way, an arrow in R is represented by a point 


x= (£1, T2, z3), 
where z; € R, i = 1, 2, 3, which are called the coordinates of x. Such an 
arrow is, in a mathematical terminology, called a vector, while the numbers 
are called scalars. The magnitude of an arrow in R is just the distance 
of x from the origin, that is, the length of the vector. We will discuss the 


length of a vector later in Chapter 5, while the concept of vectors will be 
discussed in this chapter. 


53 


54 Chapter 2. Vector Spaces 


For a general definition of the vectors, we extract the most basic proper- 
ties of those arrows in R, which are represented by points x = (£1, £2, £3) 
in R. Note that for all vectors (or points) in R®, there are two algebraic 
operations: the sum of two vectors and scalar multiplication of a vector by 
a scalar. That is, for two vectors x = (#1, £2, £3), y = (Y1, Y2, y3) in R 
and k a scalar, we define 


x+y = (21+ 41, 22+ ye, T3 +3), 

kx = (kzi, kx, ka). 
Then a vector x = (21, £2, 73) in R? may be written as 
X= (x1, £2, £3) = T1€1 + T2€2 + 733, 


where e; = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1) which were intro- 
duced as the rectangular coordinate system in calculus. The sum of vectors 
and the scalar multiplication of vectors in the 3-space R? are illustrated in 
Figure 2.1: 


T3 x+y T3 


T1 T1 


Figure 2.1: A vector sum and a scalar multiplication 


Even though our geometric visualization of vectors does not go beyond 
the 3-space R?, it is possible to extend these algebraic operations of vectors 
in the 3-space R? to the n-space R” for any positive integer n. It is defined 
to be the set of all ordered n-tuples (a1, a2, ..., an) of real numbers, called 
vectors : t.e., 


R” = {(a, Q2; +++, Gn) : q ER, 1=1, 2,..., n}. 


For any two vectors x = (£1, £2, ..., Zn) and y = (y1, Y2, .--,; Yn) in the 
n-space R”, and a scalar k, the sum x + y and the scalar multiplication kx 
of them are vectors in R” defined by 


x+y = (xı +41, T2 + Y2, ---;, En + Yn), 
kx = (kzı, REG, ..., ktn). 


2.1. The n-space R" and vector spaces 55 


It is easy to verify the following arithmetical rules of the operations: 


Theorem 2.1 For any scalars k and £, and vectors x = (£1, £2, ..., Tn), 
y = (yi, Y2, ---; Yn), and z = (z1, Z2, ..-, Zn) in the n-space R", the 
following rules hold: 
(1) x+y =y +x, 
(2) x+ (y +2) = (x +y) +z, 
(3) there exists O = (0, 0, ..., 0) € R”, called the zero vector, such that 
x+0=x=0+x, 
(4) x+ (~1)x =0, 
(5) k(x +y) = kx + ky, 
(6) (k +£)x = kx + x, 
(7) k(éx) = (k£)x, 
(8) lx =x. 


We usually identify a vector (a1, a2, ..., an) in the n-space R” with an 
n x 1 column vector 


an 


Sometimes a vector in R” is also identified with a 1 x n row vector (see 
Section 2.5). Then the two operations of the matrix sum and the scalar 
multiplication of column matrices coincide with those of vectors in R”, and 
Theorem 2.1 rephrases Theorem 1.3. 

These rules of arithmetic of vectors are the most important ones because 
they are the only rules that we need to manipulate vectors in the n-space 
R”. Hence, an (abstract) vector space can be defined with respect to these 
rules of operations of vectors in the n-space R” so that R” itself becomes 
a vector space. In general, a vector space is defined to be a set with two 
operations: an addition and a scalar multiplication which satisfy the rules 
(1)-(8) in Theorem 2.1. 


Definition 2.1 A (real) vector space is a nonempty set V of elements, 
called vectors, with two algebraic operations that satisfy the following rules. 

(A) There is an operation called vector addition that associates to every 
pair x and y of vectors in V a unique vector x + y in V, called the sum of 
x and y, so that the following rules hold for all vectors x, y, z in V: 


56 Chapter 2. Vector Spaces 


(1) x+y=y+x (commutativity in addition), 
(2) x+(y+z) = (x+y) + z(= x +y + z) (associativity in addition), 


(3) there is a unique vector 0 in V, called the zero vector, such that 
x+0=x=0+xforalxeV, 


(4) for any x E V, there is a vector X € V such thatx +X =xX+x=0. 


(B) There is an operation called the scalar multiplication that asso- 
ciates to each vector x in V and each scalar k a unique vector kx in V so 
that the following rules hold for all vectors x, y, z in V and all scalars k, £: 


(5) k(x+y)=kx+ky (distributivity with respect to vector addition), 
(6) (k+2)x=kx+éx (distributivity with respect to scalar addition 


), 
(7) k(éx) = (k£)x (associativity in scalar multiplication) 


(8) lx =x. 


, 


Clearly, the n-space R” is a vector space by Theorem 2.1. A complex 
vector space is obtained if, instead of real numbers, we take complex num- 
bers for scalars. For example, the set C” of all ordered n-tuples of complex 
numbers is a complex vector space. In Chapter 7 we shall discuss complex 
vector spaces, but until then we will discuss only real vector spaces unless 
otherwise stated. 


Example 2.1 (1) For any two positive integers m and n, the set Mmxn(R) 
of all m x n matrices forms a vector space under the matrix sum and the 
scalar multiplication defined in Section 1.3. The zero vector in this space is 
the zero matrix Omxn, and —A is the negative of a matrix A. 

(2) Let A be an m x n matrix. Then it is easy to show that the set of 
solutions of the homogeneous system Ax = 0 is a vector space (under the 
sum and the scalar multiplication of matrices). 

(3) Let C(R) denote the set of real-valued continuous functions defined 
on the real line R. For two functions f and g, and a real number k, the sum 
f +g and the scalar multiplication kf of them are defined by 


(f+g)(z) = f(x) + 9(2), 
(kf)(z) = kf(z). 


Then the set C(R) is a vector space under these operations. The zero vector 
in this space is the constant function whose value at each point is zero. 


2.1. The n-space R” and vector spaces 57 


(4) Let S(R) denote the set of real-valued functions defined on the set of 
integers. A function f € S(R) can be written as a doubly infinite sequence 
of real numbers: 

«+5 F-2, Lj, LO, V1, T2; ...-, 


where zg = f(k) for each k. This kind of sequences appear frequently in 
engineering, and called a discrete or a digital signal. One can define the sum 
of two functions and the scalar multiplication of a function with a scalar just 
like in C(R) in (3) so that S(R) becomes a vector space. 


Theorem 2.2 Let V be a vector space and let x, y be vectors in V. Then 
(1) x+y =y implies x = 0, 


(2) 0x = 0, 
(3) kO = 0 for any k E R, 
(4) X is unique and x = (—1)x = —x, which is called the negative of x, 


(5) if kx =0, then k =0 orx=0. 


Proof: (1) By adding —y to both sides of x + y = y, we have 
x=x+0=x+y+(-y)=y+(-y) =0. 


(2) 0x = (0 + 0)x = Ox + 0x implies 0x = 0 by (1). 

(3) This is an easy exercise. 

(4) The uniqueness of the negative x of x can be shown by a simple 
modification of Lemma 1.7. In fact, if x’ is another negative of x such that 
x+% = 0, then 


K=X+0=x+(x+x’) =(K+x) +x’ =0+x 


1 —/ 
=X. 
On the other hand, the equation 
x+ (—1)x = 1x + (-1)x = (1 — 1)x = 0x = 0 


shows that (—1)x is another negative of x, and hence x = (—1)x by the 
uniqueness. (—1)x is denoted by —x. 
(5) Suppose kx = 0 and k #0. Then x = 1x = ¢(kx) = 70 = 0. 


Problem 2.1 Let V be the set of all pairs (x, y) of real numbers. Suppose that 
an addition and scalar multiplication of pairs are defined by 


(x, y) + (u, v) = (x +2u, y+2v), k(x, y) = (ka, ky). 
Is the set V a vector space under those operations? Justify your answer. 


58 Chapter 2. Vector Spaces 


2.2 Subspaces 


Definition 2.2 A subset W of a vector space V is called a subspace of V 
if W itself is a vector space under the addition and the scalar multiplication 
defined in V. 


However, in order to show that a subset W is a subspace of a vector 
space V, it is not necessary to verify all the arithmetic rules of the definition 
of a vector space. One only needs to check whether a given subset is closed 
under the same vector addition and scalar multiplication as in V. This is due 
to the fact that certain rules satisfied in the larger space are automatically 
satisfied in every subset. 


Theorem 2.3 A nonempty subset W of a vector space V is a subspace if 
and only ifx +y and kx are contained in W (or equivalently, x+ ky € W) 
for any vectors x and y in W and any scalar k € R. 


Proof: We need only to prove the sufficiency. Assume both conditions hold 
and let x be any vector in W. Since W is closed under scalar multiplication, 
0 = 0x and —x = (—1)x are in W, so rules (3) and (4) for a vector space 
hold. All the other rules for a vector space are clear. 


A vector space V itself and the zero vector {0} are trivially subspaces. 
Some nontrivial subspaces are given in the following examples. 


Example 2.2 Let W = {(z, y, z) ER : ax + by + cz = 0}, where a,b,c 
are constants. If x = (x1, £2, £3), Y = (Yı, Y2, y3) are points in W, then 
clearly x +y = (41 + y1, £2 + yo, £3 + y3) is also a point in W, because 
it satisfies the equation in W. Similarly, kx also lies in W for any scalar k. 
Hence, W is a subspace of R?, which is a plane passing through the origin 
in RÌ. 


Example 2.3 Let A be an mxn matrix. Then, as shown in Example 2.1(2), 
the set 
W={xeR’ : Ax=0} 


of solutions of the homogeneous system Ax = 0 is a vector space. Moreover, 
since the operations in W and in R” coincide, W is a subspace of R”. 


Example 2.4 For a nonnegative integer n, let P,(IR) denote the set of all 
real polynomials in x with degree < n. Then P,,(R) is a subspace of the 
vector space C(R) of all continuous functions on R. 


2.2. Subspaces 59 


Example 2.5 Let W be the set of all n x n real symmetric matrices. Then 
W is a subspace of the vector space My xn(R) of all n x n matrices, because 
the sum of two symmetric matrices is symmetric and a scalar multiplication 
of a symmetric matrix is also symmetric. Similarly, the set of all n x n 
skew-symmetric matrices is also a subspace of Mnxn(R). 


Problem 2.2 Which of the following sets are subspaces of the 3-space R3? Justify 
your answer. 


(1) W = {(z, y, z) ER : yz =0}, 

(2) W = {(2t, 3t, 4t)e R® : te R}, 

(3) W={(a, y, z) ER : +y? —2? =O}, 

(4) W={x eR : xu = 0 = xTv}, where u and v are any two fixed nonzero 


vectors in R°. 
Can you describe all subspaces of the 3-space R?? 


Problem 2.3 Let V = C(R) be the vector space of all continuous functions on R. 
Which of the following sets W are subspaces of V? Justify your answer. 
(1) W is the set of all differentiable functions on R. 
(2) W is the set of all bounded continuous functions on R. 
(3) W is the set of all continuous nonnegative-valued functions on R, i.e., f(x) > 
0 for any x E€ R. 
(4) W is the set of all continuous odd functions on R, i.e., f(—x) = — f(x) for 
any cE R. 
(5) W is the set of all polynomials with integer coefficients. 


Definition 2.3 Let U and W be two subspaces of a vector space V. 
(1) The sum of U and W is defined by 


U+W={ut+w eV: ucU, wew}. 


(2) A vector space V is called the direct sum of two subspaces U and 
W, written as V =U OW, ifV =U +W and UNW = {0}. 


It is not hard to see that U + W and U N W are also subspaces of 
V. For example, if V = R (ry-plane), U = {xe, : z € R} (z-axis), and 
W = {yea : y € R} (y-axis), where e; = (1,0) and eg = (0,1), then it is 
easy to see that R? = R @ R, by considering the z-axis (and also y-axis) as 
R. Similarly, one can be easily convinced that R? = R oR! = R! e R OR'. 


Problem 2.4 Let U and W be subspaces of a vector space V. 
(1) If Z is a subspace of V contained in both U and W, prove that Z is also 
contained in UN W. 
(2) Suppose that Z is a subspace of V containing both U and W. Show that 
U + W is a subspace of Z. 


60 Chapter 2. Vector Spaces 


Theorem 2.4 A vector space V is the direct sum of subspaces U and W, 
ie., V =U OW, if and only if for any v E€ V there exist unique u E€ U and 
w EW such that v = u +w. 


Proof: Suppose that V = U®W. Then, for any v € V, there exist vectors 
u € U and w € W such that v = u + w, since V = U + W. To show the 
uniqueness, suppose that v is also expressed as a sum u’ + w’ for u’ € U 
and w' € W. Then u + w = u’ + w' implies 


u-u =w —-wEUNW ={0}. 
Hence, u = u’ and w = w’. 
Conversely, suppose that there exists a nonzero vector v in UNW. Then 
v can be written as a sum of vectors in U and W in many different ways: 


1 1 1 2 
VSS OM ig A a 


Example 2.6 (Sum, but not direct sum) Consider the three vectors i, j 
and k in R?. Let U = {a,e; + b3e3 : ai, b3 E€ R} be the xz-plane, and let 
W = {a2€2 + c3€3 : a2,c3 E R} be the yz-plane, which are both subspaces 
of R. Then a vector in U + W is of the form 


(aiei } b3e3) } (a2€2 } C33) = aye, ages (b3 } C3) €3 


= aye, ages a3e3 = (a1, a2, 43), 


where a3 = b3 + c3 and a1, a2, a3 can be arbitrary numbers. Thus U +W = 
R. However, R? 4 U @ W since clearly e3 € UN W Æ {0}. In fact, the 
vector e3 € R? can be written as many linear combinations of vectors in U 


and W: 


1 1 1 2 
Ge 5ea toes = 53 es UW. 


Note that if we had taken W = {yes : y € R} to be the y-axis, then 
it would be easy to see that R? = U @W. Note also that there are many 
choices for W so that PR =U W. 


Problem 2.5 Let U and W be the subspaces of the vector space Mnxn(R) consist- 
ing of all symmetric matrices and all skew-symmetric matrices, respectively. Show 
that Mnxn(R) =U & W. Therefore, the decomposition of a square matrix A given 
in (3) of Problem 1.11 is unique. 


2.3. Bases 61 


2.3 Bases 


Our next concern is how to express the vectors in a practical way. It turns 
out that the vectors in a vector space can be expressed by some small number 
of particular vectors. As we know, a vector in the 3-space R? is of the form 
(x1, 2, £3), and it also can be written as 


x = (z1, L2, £3) = 71 e1 + Ley + T3€3, 


where e; = (1, 0, 0), e2 = (0, 1, 0) and e3 = (0, 0, 1). This means that 
any vector x = (21, 22, £3) in R can be expressed as the sum of scalar 
multiples of e1, eg and e3. Moreover, this expression is unique. Hence, the 
basic arithmetic rules of vectors in R? reduce any computations of vectors 
to that of those three vectors e1, e2, e3. However, there maybe some other 
sets of vectors that play the same roles as those three vectors. Such a set 
of vectors is called a basis for the vector space. In this section, we discuss 
about this. 


Definition 2.4 Let V be a vector space, and let {x1, x2, ..., Xn} bea set 
of vectors in V. Then a vector y in V of the form 


y = 01X1 + 42X2 +++: + anXn, 


where ai, a2, ..., Qn are scalars, is called a linear combination of the 
vectors X], X2, ..-, Xn- 


The next theorem shows that the set of all linear combinations of a finite 
set of vectors in a vector space forms a subspace. 


Theorem 2.5 Let Xi, X2, ..., Xn be vectors in a vector space V. Then 
the set W = {a,x +02X2 +---+anXn : a; E R} of all linear combinations 
of X1, X2, ..., Xn is a subspace of V. It is called the subspace of V 
spanned by X1, X2, ..., Xn- OT, X1, X2, ..., Xn Spans the subspace W. 


Proof: We show that W is closed under the vector sum and the scalar 
multiplication. Let u and w be two vectors in W. Then 


u = @X, © Q2X2 T tts T anXn; 


w = bx} pa boxe Te cious sp bnXn 


for some scalars a;’s and b;’s. Then, for any scalar k € R, 


u+w = (a; + b1)x) + (a2 + b2)x2 +--+ + (Gn + bn) Xn 
ku (ka ,)x, + (kag)x2 +--+ + (kan)Xn 


62 Chapter 2. Vector Spaces 


mean that u + w and ku are linear combinations of X1, X2, ..., Xn, and 


consequently contained in W. 


Example 2.7 (1) For a nonzero vector v in a vector space V, a linear 
combination of v is simply a scalar multiple of v. Thus the subspace W of 
V spanned by v is W = {kv : k € R}. 

(2) Consider three vectors vı = (1,1,1), v2 = (1,—1,1) and v3 = 
(1,0,1) in Rè. The subspace W, spanned by vı and vy is written as 


W, = {avı + a2V2 = (a1 + a2, a) — a2, a) + a2) : a; € R}, 


and the subspace W2 spanned by v1, v2 and v3 is written as 


Wə = {avi + a2V2 + 43V3 = (a1 + a2 + a3, Q1 — Q2, Q1 + a2 4 a3) : Q; € R}. 


Then avı + a2V2 = @1Vı + dev2 + 0v3 implies Wy C Wo. On the other 
hand, any vector in Wə is of the form avı + deve + a3v3. But, since 
v3 = vı + ġv2 € Wi, W2 C Wy. Thus W; = W; which is a plane in R? 
containing the vectors vi, V2 and v3. In general, a subspace in a vector 
space can have many different spanning sets. 


Example 2.8 Let 


Ci = (1, 0, 0, ’ 0), 
©. = (0, 1, 0, ’ 0), 
e, = (0, 0, 0,..., 1) 


be n vectors in the n-space R” (n > 3). Then a linear combination of 
€i, €2, e3 is of the form 


ae; + a2€2 + azez = (a1, a2, a3, 0, ..., 0). 


Hence, the set 
W = {(a1, a2, a3, 0, ..., 0) ER” : ai, a2, az E R} 


is the subspace of the n-space R” spanned by the vectors e1, €2, €3. Note 
that the subspace W can be identified with the 3-space R? through the 
identification 

(a1, a2, a3, 0, see 0) = (a1, a2, a3) 
with a; € R. In general, for m < n, the m-space R” can be identified as a 
subspace of the n-space R”. 


2.3. Bases 63 


Example 2.9 [Column space] Let A = [a; ag --- a,| be an m x n matrix 
with column a,’s. Then the column vector a;’s are in R”, and the matrix 
product Ax represents the linear combination of the column vector a,’s 
whose coefficients are the components of x € R”, i.e., Ax = x1a, + t2a2 + 
--- + Znan (see Example 1.9). Therefore, the set 


W ={AxeR” : xeER"} 


of all linear combinations of the column vectors of A is a subspace of R” 
called the column space of A. Consequently, Ax = b has a solution (a1, 
T2, +++, Zn) in R if and only if the vector b belongs to the column space of 
A. 


Problem 2.6 Let x1, x2, ..., Xn be vectors in a vector space V and let W be 
the subspace spanned by x1, X2, ..., Xn. Show that W is the smallest subspace 
of V containing x1, x2, ..., Xn. In other words, if U is a subspace of V containing 
KJ Kossar Xn, then W C U. 


Problem 2.7 Show that the set of all matrices of the form AB — BA can not span 
the vector space Mnxn(R). 


As we saw in Theorem 2.5 and Example 2.7, any nonempty subset of 
a vector space V spans a subspace through the linear combinations of the 
vectors, and there may be many different spanning sets that span the same 
vector space. This means that a vector can be expressed as linear combina- 
tion in many different ways even for a fixed set of vectors. 

However, for some set of vectors in a vector space V, any vector in V 
can be expressed uniquely as a linear combination of the set. Such a set 
of vectors is called a basis for V. In the following we will make this notion 
clear and show how to find such a basis. 


Definition 2.5 A set of vectors {x,, X2, ..., Xn} in a vector space V is 
said to be linearly independent if the vector equation, called the linear 
dependence of x;’s, 


C1 X1 + CoXg + +++ + CnXn = O 


has only the trivial solution cı = cg =--- = Cn = 0. Otherwise, it is said to 
be linearly dependent. 


By definition, a set of vectors {x,, X2, ..., Xn} is linearly dependent if 
and only if the linear dependence 


Cy X1 + CoXy + +++ + CnXn = O 


64 Chapter 2. Vector Spaces 


has a nontrivial solution (c1, c2, ..., Cn). For example, if cn # 0, the 
equation can be rewritten as 
C1 C2 Cn—1 


Xn = —— X] — —X2 
Cn Cn Cn 


Xn-1- 


That is, a set of vectors is linearly dependent if and only if at least one of 
the vectors in the set can be written as a linear combination of the others. 


Example 2.10 Let x = (1,2,3) and y = (3,2,1) be two vectors in the 
3-space R3. Then clearly y 4 Ax for any À € R (or ax + by = 0 is possible 
only when a = b = 0). This means that {x,y} is linearly independent in 
R. If w = (3,6,9), then {x,w} is linearly dependent since w — 3x = 0. 
In general, if x, y are non-collinear vectors in the 3-space R?, the set of all 
linear combinations of x and y determines a plane W through the origin in 
R?, i.e., W = {ax+by : a,b E€ R}. Let z be another nonzero vector in the 3- 
space R?. If z € W, then there are some scalars a, b € R, not all of them are 
zero, such that z = ax + by, that is, the set {x, y, z} is linearly dependent. 
If z ¢ W, then ax + by + cz = 0 is possible only when a = b = c = 0 (prove 
it). Therefore, the set {x, y, z} is linearly independent if and only if z does 
not lie in W. 


By an abuse of language, it is sometimes convenient to say that “the 
although this is really a 


vectors X1, X2, ..., Xn are linearly independent,” 


property of a set. 


Example 2.11 The columns of the matrix 


t 2 ST 10 
A=|]4 2 6 8 
2c=1 1 3 


are linearly dependent in the 3-space RÌ, since the third column is the sum 
of the first and the second. 


As in Example 2.11 shows, the concept of linear dependence can be 
applied to the row or column vectors of any matrix. 


Example 2.12 Consider an upper triangular matrix 


2 35 
A=]/]0 1 6 
0 0 4 


2.3. Bases 65 


The linear dependence of the column vectors of A may be written as 


2 3 5 0 
alol+elil+alel=l0], 
0 0 4 0 


which, in matrix notation, may be written as a homogeneous system: 


2 35 c 0 
0 1 6 co | =| 0 
0 0 4 C3 0 


From the third row, c3 = 0, from the second row cp = 0, and substitution of 
them into the first row forces cı = 0, i.e., the homogeneous system has only 
the trivial solution, so that the column vectors are linearly independent. 


The following theorem can be proven by the same argument. 


Theorem 2.6 The nonzero rows of a matrix of row-echelon form are lin- 
early independent, and so are the columns that contain leading 1’s. 


In particular, the rows of any triangular matrix with nonzero diagonals 
are linearly independent, and so are the columns. If U is the reduced row- 
echelon form of A, then we know that Ax = 0 and Ux = 0 have the same 
set of solutions. Moreover, a homogeneous system Ax = 0 with unknowns 
more than the equations always has a nontrivial solution by Theorem 1.2. 
This proves the following lemma. 


Lemma 2.7 (1) Any set of n vectors in the m-space R” is linearly de- 
pendent ifn >m. 


(2) If U is the reduced row-echelon form of A, then the columns of U 
are linearly independent if and only if the columns of A are linearly 
independent. 


Example 2.13 Consider the vectors e; = (1, 0, 0), e2 = (0, 1, 0) and 
e3 = (0, 0, 1) in the 3-space R. The vector equation c1e1 +c2e2 +c3e3 = 0 
becomes 


c(l, 0, 0) + c2(0, 1, 0) + c3 (0, 0, 1) =o (0, 0, 0) 


or, equivalently, (c1, c2, c3) = (0, 0, 0). Thus, c1 = co = c3 = 0, so the set 
of vectors {e1, e2, e3} is linearly independent and also spans R?. 


66 Chapter 2. Vector Spaces 


Example 2.14 The vectors e;, e2,... , €En in R are clearly linearly inde- 
pendent (see Theorem 2.6). Moreover, they span the n-space R”: In fact, a 
vector X = (£1, Z2, .--, £n) € R is a linear combination of the vector e;’s: 


X = (£1, Lo, ..., Ln) = L1€1 + T2€2 ` + Lyn. 


Definition 2.6 Let V be a vector space. A basis for V is a set of linearly 
independent vectors that spans V. 


For example, as in Example 2.14, the set {e1, e2, ..., en} forms a basis, 
called the standard basis for the n-space R”. Of course, there are many 
other bases for R”. 


Example 2.15 (1) The set of vectors (1,1,0), (0,—1, 1), and (1,0, 1) is not 
a basis for the 3-space RÌ, since this set is linearly dependent (the third is 
the sum of the first two vectors), and cannot span R. (The vector (1,0, 0) 
cannot be obtained as a linear combination of them (prove it).) 

(2) The set of vectors (1,0,0), (0,1,1), (1,0,1) and (0,1,0) is not a 
basis either, since they are not linearly independent (the sum of the first 
two minus the third makes the fourth) even though they span RÌ. This set 
of vectors has some redundant vectors spanning R°. 

(3) The set of vectors (1,1,1), (0,1,1), and (0,0,1) is linearly inde- 
pendent and also spans RÌ. That is, it is a basis for RÌ, different from the 
standard basis. This set has the proper number of vectors spanning R°, since 
the set cannot be reduced to a smaller set nor does it need any additional 
vector to span R. 


By definition, in order to show that a set of vectors in a vector space is a 
basis, one needs to show two things: it is linearly independent, and it spans 
the whole space. 

In particular, for a set of vectors @ = {X1, X2, ..., Xn} in R”, one can 
construct an mxn matrix A = [x] x2 --: Xn]. Then, in view of Example 2.9, 
(1) a is linearly independent if and only if Ax = a,x; +---+a,x, = 0 
has only the trivial solution, and a is linearly dependent if and only if the 
equation has a nontrivial solution x = (a),...,d,) E R°. (2) a spans R” 
if and only if for all b € R” there is an x = (a1,...,@,)) € R” such that 
Ax = b. Thus if a is a basis for R”, then we must have m = n. 

The following theorem shows that a basis for a vector space represents a 
coordinate system just like the rectangular coordinate system by the stan- 
dard basis for R”. 


2.4. Dimensions 67 


Theorem 2.8 Let a = {vj, V2, ..., Vn} be a basis for a vector space V. 
Then each vector x in V can be uniquely expressed as a linear combination 
of Vi, V2, ---; Vn, ie., there are unique scalars a;j’s,i=1, 2, ..., n, such 


that 
X = @1V1 + a2V2 + +- + nVn. 


Proof: If x can also be expressed as x = bivi + b2V2 + -+ + bnVn, then 
we have 0 = (a1 — b1)vi + (a2 — b2)v2 + +- + (an — bn)Vn. By the linear 
independence of v;’s, a; = b; for all i = 1, 2, ..., n. 


Example 2.16 Let a = {e1,e2,e3} be the standard basis for R?, and let 
B = {v1, V2, V3} with Vi = (1, 1, 1) = €] + €2 + €3, V2 = (0, 1, 1) = € + €3, 
v3 = (0,0,1) = e3. Then, £ is also a basis for R? (see Example 2.15(3)). 
For any x = (z1, £2, £3) € R®, one can easily verify that 


X = T1€1 + T2€2 + T3€3 


= Vi (x2 L1)V2 } (x3 £2)V3. 


Problem 2.8 Show that the vectors vı = (1, 2, 1), v2 = (2, 9, 0) and v3 = 
(3, 3, 4) in the 3-space R? form a basis. 


Problem 2.9 Show that the set {1, æ, 27, ..., 2”} is a basis for P,,(R), the 
vector space of all polynomials of degree < n with real coefficients. 


Problem 2.10 In the n-space R” , determine the linear dependence of the following 


set of vectors: 
{e1 — €2, €2 — €3, «++, En-1 T En, En — ey}. 


Problem 2.11 Let x, denote the vector in R” whose first k — 1 coordinates are 
zero and whose last n —k+1 coordinates are 1. Show that the set {x1, x2, ..., Xn} 
is a basis for R”. 


2.4 Dimensions 


We often say that the line R! is one-dimensional, the plane R? is two- 
dimensional and the space R? is three-dimensional, etc. This is mostly due 
to the fact that the degree of freedom in choosing coordinates for elements 
in the spaces are 1, 2 or 3, respectively. In fact, this degree of freedom is 
closely related to the size of a basis. Even though there is no unique way in 
choosing a basis, there is something common to all the bases, which is an in- 
trinsic property of the space itself and leads to the notion of the dimension. 
We begin with the following lemma: 


68 Chapter 2. Vector Spaces 


Lemma 2.9 Let V be a vector space and let a = {X1, X2, ..., Xm} be a 
set of m vectors in V. 
(1) Ifa spans V, then every set of vectors with more than m vectors cannot 
be linearly independent. 


(2) Ifa is linearly independent, then any set of vectors with fewer than m 
vectors cannot span V. 


Proof: Since (2) follows from (1) directly, we prove (1) only. Let 6 = 


{y1, Y2, ---; Yn} bea set of n-vectors in V with n > m. We will show that 
b is linearly dependent. Indeed, since a spans V, for j = 1, 2, ..., n, 
m 
Yj = G1jX1 + a2jX2 + +++ + aümjXm =Y R 
i=1 
Hence 
C1ry1 + C2Y2 4+ + CnYn = cı (a11X1 + G21 X2 + +++ + m1 Xm) 
+ c2(@12X1 + a22X2 +--+ + Gm2Xm) 


ap Cy (Gin Xi: + danX2 +: + One) 


= (a11¢1 Q12C2 T+ iy, Ca Xt 
(a21 Ci T Q222 T+: a2nCn)X2 
F (Gm1C1 F Am2cg +++: 4 Giniln) Xir 


Thus, £ is linearly dependent if and only if the system of linear equations 
c1y1 + cay2 +--+ + Cnyn = 90 


has a nontrivial solution (c1, c2, ..., Cn) # (0,0,--- ,0). This is true if all 
the coefficients of x;’s are zero but not all of c;’s are zero. It means that the 
homogeneous system of linear equations in c;’s, 


ail G12 t+) Ain Ci 0 
Q21 422 ° Gan C2 0 

= pl 
Aml Am2 ‘** Amn Cn 0 


must have a nontrivial solution. But it is guaranteed by Lemma 2.7, since 
A is an m X n matrix with m <n. 


2.4. | Dimensions 69 


It is clear by Lemma 2.9 that if a set a = {x,, X2, ..., Xn} of n 
vectors is a basis for a vector space V, then it is maximally independent and 
minimally spanning set of vectors, and so no other set 6 = {y1, ya, -.-, yr} 
of r vectors can be a basis for V if r Æ n. Therefore, all bases for a vector 
space V have the same number of vectors, even if there are infinitely many 
different bases for the vector space. 


Theorem 2.10 Jf a basis for a vector space V consists of n vectors, then 
so does every other basis. 


Definition 2.7 The dimension of a vector space V is the number, say n, 
of vectors in a basis for V, denoted by dimV = n. When V has a basis of 
a finite number of vectors, V is said to be finite dimensional. 


Example 2.17 The following can be easily verified: 


(1) If V has only the zero vector: V = {0}, then dim V = 0. 


(2) If V = R”, then the standard basis {e),e2,...,e,} for V implies 
dim R” = n. 


(3) If V = P,(R) of all polynomials of degree less than or equal to n, then 


dim P (R) =n + 1 since {1,7,27,...,2”} is a basis for V. 
(4) If V = Mmxn(R) of all m x n matrices, then dim Mmxn(R) = mn 
since {jj : i = 1,...,m, j =1,...,n} is a basis for V, where Ej; is 


the m x n matrix whose (i, j)-th entry is 1 and all others are zero. 


If V = C(R) of all real-valued continuous functions defined on the real 
line, then one can show that V is not finite dimensional. A vector space 
V is infinite dimensional if it is not finite dimensional. In this book, we 
are concerned only with finite dimensional vector spaces unless otherwise 
stated. 


Theorem 2.11 Let V be a finite dimensional vector space. 


(1) Any linearly independent set of vectors in V can be extended to a basis 
by adding more vectors if necessary. 


(2) Any set of vectors that spans V can be reduced to a basis by discarding 
vectors if necessary. 


Proof: We prove (1) only and leave (2) as an exercise. Let œ = {X1, X2, 
...; Xk} be a linearly independent set in V. If œ spans V, then a is a basis. If 


70 Chapter 2. Vector Spaces 


a does not span V, then there exists a vector, say x,41, in V that is not con- 
tained in the subspace spanned by the vectors in a. Now {x1, ..., Xk; Xk+1} 
is linearly independent (check why). If {x1, ..., Xk, Xk+1} spans V, then 
this is a basis for V. If it does not span V, then the same procedure can be 
repeated, until a basis for V is obtained. Indeed, this procedure stops in a 
finite step because of Lemma 2.9 for a finite dimensional vector space V. 


In particular, if W is a subspace of V, then any basis for W is linearly 
independent in V too, and can be extended to a basis for V. Thus dim W < 
dim V. The following is a direct consequence of Theorem 2.11. 


Corollary 2.12 Let V be a vector space of dimension n. Then 
(1) any set of n vectors that spans V is a basis for V, and 


(2) any set of n linearly independent vectors is a basis for V. 


Therefore, if a set of n = dimV vectors in a vector space V either is 
linearly independent or spans V, then it is already a basis for the space V. 


Example 2.18 Let W be the subspace of R* spanned by the vectors 
X = (1, —2, 5, —3), X2 = (0, 1, 1, 4), x3 = (1, 0, 1, 0). 


Find a basis for W and extend it to a basis for R£. 


Solution: Note that dim W < 3 since W is spanned by three vectors x;’s. 
Let A be the 3 x 4 matrix whose rows are x1, X2 and xs: 


1 -2 5 —3 

A=]|0 1 1 4 

1 0 1 0 

Reduce A to a row-echelon form: 

1 —2 5 —3 

U= 0 1 1 4 

5 

0 01 +- 

6 


The three nonzero row vectors of U are clearly linearly independent, and 
they also span W because the vectors x1, x2 and x3 can be expressed as 
a linear combination of these three nonzero row vectors of U. Hence, U 


2.4. Dimensions 71 


provides a basis for W. (Note that this implies dim W = 3 and hence xı, 
X2, X3 is also a basis for W by Corollary 2.12. The linear independence of 
x;’s is a by-product of this fact). 

To extend it to a basis for R*, just add any nonzero vector of the form 
x4 = (0, 0, 0, Ł) to the rows of U. 


Problem 2.12 Let W be a subspace of a vector space V. Show that if dimW = 
dim V, then W = V. 


Problem 2.13 Find a basis and the dimension of each of the following subspaces 
of Mnxn(R) of all n x n matrices. 

(1) The space of all n x n diagonal matrices whose traces are zero. 

(2) The space of all n x n symmetric matrices. 

(3) The space of all n x n skew-symmetric matrices. 


As a direct consequence of Theorem 2.11 and the definition of the direct 
sum of subspaces, one can show the following corollary. 


Corollary 2.13 For any subspace U of V, there is a subspace W of V such 
that V =UOW. 


Proof: Choose a basis {u;, ..., ug} for U, and extend it to a basis {u), ..., 
Uk, Upii, ---, Un} for V. Then the subspace W spanned by {ug41, ..., Un} 
satisfies the requirement. 


Definition 2.8 Let U and W be two subspaces of a vector space V such 
that V =U ®W. Then W is called a complementary subspace of U in V. 
A subspace of a vector space is called a hyperspace if the dimension of a 
complementary subspace is 1. 


Note that all the complementary subspaces have the same dimension. 
For a linear equation a,x; + dg%2 +--+: + an£n = 0, the solution space is a 
hyperspace (see Example 2.3 and Theorem 5.16). 


Problem 2.14 Let {v1, vo, ..., Vn} be a basis for a vector space V and let 
Wi = {rv; : r € R} be the subspace of V spanned by v;. Show that V = 
WaWa Wn. 


72 Chapter 2. Vector Spaces 


2.5 Row and column spaces 


In this section, we look at the systems of linear equations in terms of vector 
spaces. Note that an m x n matrix A can be abbreviated by the row vectors 
or column vectors as follows: 


Qil 412 +7: Gin rı 
Q21 Q22 *** Gan r2 
A = = 
Am1 m2 ‘*' Amn rm 
pæd [ Cy C2 eee Cy | g 


where r; is the i-th row vectors of A in R”, and c; is the j-th column vectors 
of A in R”. 


Definition 2.9 Let A be an mxn matrix with row vectors {r1, re, ..., rm} 
and column vectors {c , C2, ..., Cn}. 
(1) The row space of A is the subspace in R” spanned by the row vectors 
{r1, r2, ..., rm}, denoted by R(A). 
(2) The column space of A is the subspace in R” spanned by the column 
vectors {C1, C2, ..., Cn}, denoted by C(A). 


(3) The solution set of the homogeneous equation Ax = 0 is called the 
null space of A, denoted by N (A). 


Since the rows of A are just the columns of A’ and the columns of A 
are the rows of A’, 


R(A) =C(A’) and C(A) = R(A*). 
Note also that 


C(A) = {AX = T10C1 + reef ages. X= Gi) E R” }. 


Thus, for a vector b € R”, the system Ax = b has a solution if and only if 
b e C(A) C R”. In other words, the column space C(A) is the set of vectors 
b € R” for which Ax = b has a solution. Note that the null space N (A) is 
a subspace of the n-space R”. Its dimension is called the nullity of A. 

The subspaces R(A), C(A), N(A) and N(AT) are called the funda- 
mental subspaces for A. The structure of all the solutions of the equation 
Ax = b can be understood from the bases and the dimensions of those sub- 
spaces. Note that maximal linearly independent subsets of the row vectors 


2.5. Row and column spaces 73 


and the column vectors of A will become bases for the row space and the 
column space, respectively. 

Recall that the elementary row operations to matrix A change the rows 
by some linear combinations or by exchange of the rows of A itself, and the 
inverse operations are also of the same kind. That is, if {r1, ..., rm} are 
the row vectors of an m x n matrix A, then the results of the elementary 
row operations are of the following three types: 


rı : rı 
: tj : 
A, =| kr; | for Kk #0, A= : for i < j, A3 = | ri +h; 
i p . 
Im : Im 


This means that the elementary row operations to matrix A do not 
change the row and the null spaces, so that if matrices A and B are row 
equivalent, then their row and null spaces must be equal: i.e., R(A) = R(B) 
and N(A) = N(B). 

In particular, if U is the (reduced) row-echelon form of A, the nonzero 
rows containing leading 1’s form a basis (see Theorem 2.6) for the row space 
R(A) of A. This argument proves the following theorem: 


Theorem 2.14 Let U be a (reduced) row-echelon form of a matrix A. Then 
R(A) =R(U) and N(A)=N(U). 


Moreover, if U has r nonzero row vectors containing leading 1’s, then 
they form a basis for the row space R(A), so that the dimension of R(A) is 
r which is the number of basic variables, and the dimension of N(A) is the 
number of free variables: n—r. 


The following example illustrates how to find bases for the row space, 
the null space, and for the column space in general. 


Example 2.19 Let A be a matrix given as 


1 20 2 5 rı 
o| -2 -5 1 -1 -8| |r 
ASI e Ea a AS Wig 
3 60-7 2 r4 


Find bases for the row space R(A), the null space N (A), and the column 
space C(A) of A. 


74 Chapter 2. Vector Spaces 


Solution: (1) A basis for R(A): By Gauss-Jordan elimination on A, we get 
the reduced row-echelon form U: 


1 0 0 1 Vi 
J01 -101iļ_ |v 
SE E 01 1| | v 

00 000 v4 


Since the first three nonzero row vectors V1, V2, V2 of U are clearly lin- 
early independent, they form a basis for the row space R(U) = R(A), so 
dim R(A) = 3. (Note that in the process of Gaussian elimination, we did 
not use a permutation matrix. This means that the three nonzero rows of 
U were obtained from the first three row vectors rı, rə, r3 of A and the 
fourth row r4 of A turned out to be a linear combination of them. Thus the 
first three row vectors of A also form a basis for the row space.) 

(2) A basis for N (A): It is enough to solve the homogeneous system 
Ux = 0, since N(A) = N (U). Since the first, the second and the fourth 
columns of U contain the leading 1’s, we see that the basic variables are 
“1, £2, £4, and the free variables are z3, x5. By assigning arbitrary values 
s and ¢ to the free variables z3 and x5, we find the solution x of Ux = 0 as 


Ly —2s — t —2 —1 
T2 s — 1 —1 
z3 | = s =s 1 | +¢ 0 | = sn, + tn, 
T4 — 0 —1 
£5 0 1 


where n, = (—2, 1, 1, 0, 0) and n; = (—1, —1, 0, —1, 1). In fact, the 
two vectors n, and n; are the solutions when the values of (#3, £5) = (s, t) 
are (1,0) and those of (x3, 25) = (s, t) are (0,1), respectively. They must be 
linearly independent, since (1, 0) and (0, 1), as the (a3, £5)-coordinates of 
n, and n; respectively, are clearly linearly independent. Since any solution 
of Ux = 0 is a linear combination of them, the set {ns, ne} is a basis for 
the null space N(U) = N(A). Thus dim N (A) = 2 = the number of free 
variables in Ux = 0. 

(3) A basis for C(A): Let c1, C2, C3, C4, C5 denote the column vectors 
of A in the given order. Since these column vectors of A span C(A), we 
only need to discard some of the columns that can be expressed as linear 
combinations of other column vectors. But, the linear dependence 


£1C1 + LoCo + £303 + 4404 + 45C5 = 0, i.e., Ax = 0, 


2.5. Row and column spaces 75 


holds if and only if x = (a1, ---, z5) € N(A). By taking x = n, = 
(—2, 1, 1, 0, 0) or x =n; = (—1, —1, 0, —1, 1), the basis vectors of N (A) 
given in (2), we obtain two nontrivial linear dependencies of c¢;’s: 


—2c, +€2+¢3 = 0, 


Cı — C2 — C4 + €5 = 0, 


respectively. Hence, the column vectors c3 and c5 corresponding to the free 
variables in Ax = 0 can be written as 


C3 = 2c — C2, 


C5 C1 + C2 + C4. 
That is, the column vectors c3, C5 of A are linear combinations of the column 
vectors C1, C2, C4, which correspond to the basic variables in Ax = 0. Hence, 
{c1, C2, C4} spans the column space C(A). 

We claim that {c1, cz, c4} is linearly independent. Let A= [c1 C2 c4] 
and U = [u; U2 u4] be the reduced row-echelon form of A. Then the above 
computation shows: 


1 2 2 10 0 

i -2 -5 -1 ? 010 

EE e a E a 
3 6 -7 00 0 


Since all the columns u1, u2, u4 of U contain leading 1’s, they are linearly 
independent by Theorem 2.6, and so Ux = 0 has only the trivial solution, so 
does Ax = 0 since N(A) = N(Ŭ). Thus {c1, cz, c4} is linearly independent, 
and so a basis for the column space C(A). This shows that dimC(A) =3 = 
the number of basic variables, and the column vectors of A corresponding to 
the basic variables in Ux = 0 form a basis for the column space C(A). 


In summary, given a matrix A, we first find the (reduced) row-echelon 
form U of A by Gauss-Jordan elimination. Then a basis for R(A) = R(U) 
is the set of nonzero rows vectors of U, and a basis for (A) = N(U) can 
be found by solving Ux = 0, which is easy. On the other hand, one has 
to be careful for C(U) # C(A) in general, since the column space of A is 
not preserved by Gauss-Jordan elimination. However, we have dimC(A) = 
dimC(U), and a basis for C(A) can be selected from the column vectors in 
A, not in U, as those corresponding to the basic variables (or the leading 1’s 


76 Chapter 2. Vector Spaces 


in U). To show that those column vectors indeed form a basis for C(A), we 
used a basis for the null space N (A) to eliminate the redundant columns. 

Note that a basis for the column space C(A) can be also found with the 
elementary column operations, which is the same as finding a basis for the 
row space R(AT) of AT. 


Problem 2.15 Let A be the matrix given in Example 2.19. Find a relation of 
a, b, c, dso that the vector x = (a, b, c, d) belongs to C(A). 


Problem 2.16 Find bases for R(A) and N (A) of the matrix 
1 -2 0 0 3 
2 -5 -3 -2 6 
0 5 15 10 0 
2 6 18 8 6 


Also find a basis for C(A) by finding a basis for R(AT). 


A= 


Problem 2.17 Let A and B be two n x n matrices. Show that AB = 0 if and 
only if the column space of B is a subspace of the nullspace of A. 


Problem 2.18 Find an example of a matrix A and its row-echelon form U such 
that C(A) # C(U). 


2.6 Rank and nullity 


The argument in Example 2.19 is so general that it can be used to prove 
the following theorem, which is one of the most fundamental results in lin- 
ear algebra. The proof given here is just a repetition of the argument in 
Example 2.19 in a general case, and so it may be skipped at the reader’s 
discretion. 


Theorem 2.15 (The fundamental theorem) For any m x n matriz A, 
the row space and the column space of A have the same dimension; that is, 
dim R(A) = dim C(A). 


Proof: Let dim R(A) = r and let U be the reduced row-echelon form of A. 
Then r is the number of the nonzero row (or column) vectors of U containing 
leading 1’s, which is equal to the number of basic variables in Ux = 0 or 
Ax = 0. We shall prove that the r columns of A corresponding to the r 
leading 1’s (or basic variables) form a basis for C(A), so that dimC(A) = 
r=dimR(A). 


2.6. Rank and nullity 77 


(1) They are linearly independent: Let A denote the matrix whose 
columns are those of A corresponding to the r basic variables (or leading 1’s 
in U), and let U denote its reduced row-echelon form so that Ax = 0 if and 
only if Ux=0. However, Ux = 0 has only the trivial solution since all the 
columns of U contain the leading 1’s and so they are linearly independent 
by Theorem 2.6. Therefore, Ax = 0 also has only the trivial solution, so the 
columns of A are linearly independent. 

(2) They span C(A): It is good enough to show that the columns of A cor- 
responding to the free variables are linear combinations of those correspond- 
ing to the basic variables (see Example 2.19). In fact, let {aj,, Ziz,- --, Lip} 
be the set of free variables. By assigning the value 1 to x;, and 0 to 
all the other free variables, one can find a nontrivial solution of Ax = 
£1C1 + roto +... + Spey, = O such that the column Ci; of A correspond- 
ing to z;, = 1 is written as a linear combination of the columns of A. This 
can be done for each j = 1,...,k, so the columns of A corresponding to 
those free variables are redundant in the spanning set of C(A). 


Remark: (1) In the proof of Theorem 2.15, once we have shown that the 
columns in Å are linearly independent as in (1), we may replace step (2) by 
the following argument: Note that by (1) and Theorem 2.11, dimC(A) > 
dim R(A) for any matrix A. By applying this inequality to A’, we get 
dimC(A‘) > dim R(AT). Since C(AT) = R(A) and R(AT) = C(A) hold, we 
get dimC(A) < dim R(A), which means dimC(A) = dimR(A). This also 
means that the column vectors of A span C(A), and so form a basis. 

(2) The proof (2) of Theorem 2.15 also shows that the reduced row- 
echelon form of a system is unique, which was stated in page 12. In fact, 
suppose that U; and U2 are two reduced row-echelon forms of an m x n 
matrix A. If there is no free variables, then it is quite clear that 


ae | as 
n=]. alau 
0 : 0 


Suppose that there is a free variable. Since U;x = 0 if and only if U2x = 0, 
one can easily check that the columns of U; and U2 corresponding to the 
basic variables must be the same and of the form [0 --- 010 --- OJ’. 
Moreover, this implies that each column of U; and U2 corresponding to the 
free variables must also be the same, so that Uy = U2. 


78 Chapter 2. Vector Spaces 


In summary, the following equalities are now clear from Theorems 2.14 
and 2.15: 


dimN(A) = dimNV(U) 

= the number of free variables in Ux = 0. 
dimR(A) = dimR(U) 

= the number of nonzero row vectors of U 

= the maximal number of linearly independent 

row vectors of A 
= the number of basic variables in Ux = 0. 
= the maximal number of linearly independent 


column vectors of A 


= dimC(A). 


Definition 2.10 For an m x n matrix A, the rank of A is defined to be 
the dimension of the row space (or the column space), denoted by rank A. 


Clearly, rank J, = n and rank A = rank A’. And for an m xn matrix A, 
rank A = dimR(A) = dimC(A). Since dimR(A) < m and dimC(A) < n, 
we have the following corollary: 


Corollary 2.16 If A is an m x n matriz, then rank A < min{m, n}. 


Since dim R(A) = dimC(A) = rank A is the number of basic variables, 
and dim N (A) = nullity of A is the number of free variables in Ax = 0, we 
have the following theorem. 


Theorem 2.17 (Rank Theorem) For any m x n matriz A, 


dimR(A) + dimN(A) = rank A + nullity f A = n, 
dimC(A) + dim M(AT) = rank A + nullity of Ab = m. 


If dim N (A) = 0 (or (A) = {0}), then dim R(A) =n (or R(A) = R°), 
which means that A has exactly n linearly independent rows and n linearly 
independent columns. In particular, if A is a square matrix of order n, then 
the row vectors are linearly independent if and only if the column vectors 
are linearly independent. Therefore, Ax = 0 has only the trivial solution, 
and by Theorem 1.9 we get the following corollary. 


Corollary 2.18 Let A be ann x n square matriz. Then A is invertible if 
and only if rank A =n. 


2.6. Rank and nullity 79 


Example 2.20 For a 4 x 5 matrix 


1 2 0 1 
= 32? od 1 0 

az 1 2 -3 -7 2’ 
1 2 -2 —4 3 


find the rank and the nullity of A. 


Solution: Gaussian elimination gives 


1 2 0 2 1 
0013 1 
S ea 10°08 a 
00000 


The first three nonzero rows containing leading 1’s in U form a basis for 
R(U) = R(A). Note that z1, z3 and xs are the basic variables in Ux = 0, 
since the first, third and fifth columns of U contain leading 1’s. Thus the 
three columns cı = (1, —1,1, 1), c3 = (0,1, —3, —2) and c5 = (1,0,2,3) of A, 
not the three columns of U, corresponding to those basic variables z1, £3 and 
x5 form a basis for C(A). Therefore, rank A = dim R(A) = dimC(A) = 3, 
the nullity of A = dim N (A) = 2, and dim N(A‘) = 1. 


Problem 2.19 Find the nullity and the rank of each of the following matrices: 


| 1 3 1 7 [ 2 1 2 
(1) A= 2 3 -1 9l, Ga | Ae ae O° 0 
| -1 -2 0 -5 ai 5 ee 


For each of the matrices, show that dim R(A) = dim C(A) directly by finding their 
bases. 


Problem 2.20 Show that a system of linear equations Ax = b has a solution if 
and only if rank A = rank [A b], where [A b] denotes the augmented matrix for 
Ax =b. 


Theorem 2.19 For any two matrices A and B for which AB can be defined, 
(1) N(AB) 2 N(B), 
(2) N((AB)") 2 N(A"), 
(3) C(AB) € C(A), 
(4) R(AB) E R(B). 


80 Chapter 2. Vector Spaces 


Proof: (1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = 0. 
(3) For an m x n matrix A and an n x p matrix B, 
C(AB) = {ABx : xe R} 
C {Ay : ye R"}=C(A), 
because Bx € R” for any x € R. 
(4) R(AB) = C((AB)") = C(BT A’) C C(B*) = R(B). 


Corollary 2.20 rank(AB) < min{rank A, rank B}. 


In some particular cases, the equality holds. In fact, it will be shown 
later in Theorem 5.19 that for any square matrix A, rank( AT A) = rank A = 
rank(AA*). The following problem illustrates another such case. 


Problem 2.21 Let A be an invertible square matrix. Show that, for any matrix 
B, rank(AB) = rank B = rank(BA). 


Theorem 2.21 Let A be an m x n matriz of rank r. Then 
(1) for every submatrix C of A, rank C <r, and 
(2) the matriz A has at least one r xr submatriz of rank r, that is, A has 
an invertible submatrix of order r. 


Proof: (1) Consider an intermediate matrix B which is obtained from A by 
removing the rows that are not wanted in C. Then clearly R(B) C R(A) 
and hence rank B < rank A. Moreover, since the columns of C are taken 
from those of B, C(C) C C(B) and rank C < rank B. 

(2) Note that one can find r linearly independent row vectors of A, which 
form a basis for the row space of A. Let B be the matrix whose row vectors 
consist of these vectors. Then rank B =r and the column space of B must 
be of dimension r. By taking r linearly independent column vectors of B, 
one can find an r x r submatrix C of A with rank r. 


Problem 2.22 Prove that the rank of a matrix is equal to the largest order of its 
invertible submatrices. 


Problem 2.23 For each of the matrices given in Problem 2.19, find an invertible 
submatrix of the largest order. 


Problem 2.24 Let A be an m x n matrix of rank r. Find two matrices L of size 
m xr and U of size r x n both of rank r such that A = LU. In fact, the r columns 
of L are a basis for C(A) and the r rows of U are a basis for R(A). In particular, 
every matrix A of rank one is a product of a column matrix and a row matrix. 


2.7. Bases for subspaces 81 


2.7 Bases for subspaces 


Let us try to find some relations of the dimensions of subspaces. Let U and 


W be two subspaces of a vector space V. Let {v1, vo, ..., Vs} be a basis 
for U NW. Extend this basis to a basis {v1, ..., Vs, Us41, ---, Ug} for U 
and a basis {v1, ..., Vs, Ws41, ---, we} for W. Then any vector in U +W 


is of the form: 


a1Vı +++++GsV5 + Gs41Us41 +++ + akUk 


+ bivi ++++ + bsV5 + 0541 W541 + +++ + bewe 
= Evite: + CsVs + Os41Us41 + +++ + GpUR + b541Ws41 +--+: + bewe 
which means {v1, ..., Vs, Us+1, ---; Ug, Ws+1, ---, we} spans U+W. It is 


an easy exercise to show that this set is also linearly independent. Therefore, 
we proved the following theorem: 


Theorem 2.22 Let U and W be two subspaces of a vector space V. Then 
the following holds: 


dim(U + W) = dimU + dim W — dim(U NW). 


Thus, dim(U +W) 4 dim U +dim W, in general. In particular, dim(U + 
W) = dimU + dim W if and only if U NW = {0}. In this case, U + W = 
U W. Conversely, if V =U $ W, then dim V = dimU + dim W. 

Practically, for two given subspaces U and W of the n-space V = R” 
with bases a = {u;, ..., Ug} and 6 = {w1, ..., we}, respectively, it needs 
a little work to find bases for U + W and U NW. We now introduce some 
ways of finding bases for U + W and U ANW. 

Let a = {u,, Uo, ..., ug} and 8 = {w,, W2, ..., we} be bases for U 
and W, respectively. Let Q be the n x (k + £) matrix whose columns are 
those bases vectors: 


Q = [ur ++) Up Wi Welnx(h4e): 


Theorem 2.23 Let U and W be two subspaces of R”, and Q be the matrix 
defined above. Then 


(1) C(Q) =U+W, so that a basis for the column space C(Q) is a basis for 
U+W. 


(2) N(Q) can be identified with UN W so that dim(U NW) = dim N (Q). 


82 Chapter 2. Vector Spaces 


Proof: (1) It is clear that C(Q) =U +W. 
(2) Let x = (a1,...,a%,01,...,b2) EN (Q) C RH. Then 


QX = aU; +--+ + agug + byw) +--+ + bewe = 0, 
from which we get au; +--- + apuk = —(b;w, +--- + bewe). If we set 
y = ayuj,t+-:::+aguz EU 


—(b)w; +--+ + bewe) E W, 


then y € UMW. This means that to each x E€ N(Q) there corresponds a 
vector y in UMW. On the other hand, if y € UMW, then y can be written 
in two linear combinations by the bases for U and W separately like: 


y = au, +--:+aguz EU 
byw, +--+ + bewe E W, 


for some a1,..., ap and b1,..., bg. Let x = (a1,...,@%,—b1,..., —bg) E€ Ret. 
Then it is quite clear that Qx = 0, i.e., x E€ N(Q). Therefore, the corre- 
spondence of x in V(Q) C R*+ and vectors y in UMW C R gives a 
one-to-one correspondence between the sets V(Q) and U NW. 

Moreover, if x;, 1 = 1,2, correspond to y;, then one can easily check that 
Xı + X2 corresponds to yı + y2, and kx, corresponds to ky,. This means 
that the two vector spaces N (Q) and U N W can be identified as vector 
spaces (see Section 3.2 for more precise meaning of this identification). In 
particular, for a basis for ’(Q), the corresponding set in U N W is also a 
basis for U N W: that is, if the set of vectors 


xX, = (@11,---;@1k,011,---; b1), 


Xs = (as1,.---, ask, bs1,--- , bsg), 


is a basis for M (Q), then the set of vectors 


yı = 4,U, ++: + aikUk, yı = —(buwı +--+: +bi¢we), 


or 


Ys = 4511 T: + GsKUg, Ys = —(bs1W1 et bsewe) 


is also a basis for U N W, and vice versa. This implies that 


dim N (Q) = dim(U N W). 


2.7. Bases for subspaces 83 


Theorem 2.23 also shows Theorem 2.22: From the Rank Theorem 2.17, 
we have 
rank Q + nullity Q = k + £. 


Since dimC(Q) = dim(U + W), dim N (Q) = dim(U N W), dim U = k and 
dim W = £, for any subspaces U and W of the n-space R”, 


dim(U + W) + dim(U NW) = dimU + dim W. 


Example 2.21 Let U and W be two subspaces of R° with bases 


u = (1, 3, —2, 2, 3), wW = (2, 3, =i, —2, 9), 
u = (1, 4, —3, 4, 2), w2 = (1, 5, =; 6, 1), 
u = (1, 3, 0, 2, 3), W3 = (2, 4, 4, 2, 8), 


respectively. Find a bases for U + W and U NA W. 


Solution: The matrix Q takes the following form: 


1 11 2 12 
3 43 3 54 
Q = |u; Uy uz WwW) We ws | = 2 3 0 1 6 4 
2 42 -2 6 2 
3 23 9 18 
Gauss-Jordan elimination gives 
100 5 00 
010-3 2 0 
oS 0 0 1 0 —i 0 
000 0 01 


From this, one can directly see that dim(U + W) = 4. The columns 
u1, U2, U3, W3 Corresponding to the basic variables in Qx = 0 (or leading 1’s 
in G) form a basis for C(Q) = U+-W. Moreover, dim N (Q) = dim(UNW) = 
2, since there are two free variables 74 and z5 in Qx = 0. 

To find a basis for U N W, we solve Gx = 0 for (x1, £2, £3, 1,0, z6) and 
(x1, £2, 3,0,1, 26). After a simple computation, we obtain a basis for N (Q): 


xı = (—5,3,0,1,0,0) and xə = (0,—2,1,0,1,0). 
From Qx; = 0, we obtain two equations: 


—5u; + 3ug TW, = 0, 


—2uUy + Uz tW? = 0. 


84 Chapter 2. Vector Spaces 


Therefore, {y1, y2} is a basis for U NW, where 


2 1 
3 
yı = 5u, — 3uU2 = —1 = W], y2 = 2u — U3 = —6 = W2. 
—2 6 
9 1 


Clearly, one can check 


dim(U + W) + dim(U NW) =44+2=343=dimU + dim W. 


Remark: (Another method for finding bases for U N W) For given two 
subspaces U and W of R” with bases a and £ respectively, there is another 
method for finding a basis for U NW: By placing those bases vectors in the 
rows, one can construct two matrices A of size k x n from a and B of size 
Lx n from 8. Then, U = R(A) and W = R(B). Construct an (k + 1) xn 
matrix A by adding an unknown vector x = (£1, £2, ..., Zn) € R? below 
the the bottom row of A: 
a 
x 


and the matrix B is defined similarly. Then it is clear that R(A) = R(A) 
and R(B) = R(B) if and only if x € UNW = R(A)NR(B). This means 
that the row-echelon form of A and that of A should be the same via the 
same Gaussian elimination. Thus, the last row of the row-echelon form 
of A must be zero, which provides a system of linear equations for x = 
(£1, £2, ..., Zn). By the same argument applied to B and B, one gets 
another system of linear equations for the same x = (£1, £2, ..., Zn). 
Then a common solution to these two systems together will be a basis for 
U NW. The following example illustrates how one can apply this argument 
to find bases for U + W and UN W. 


Example 2.22 Let U be the subspace of R° spanned by 
u = (1, 3, —-2, 2, 3), 

(1, 4, —3, 4, 2), 

u3 = (2, 3, zl, =2, 10), 


S 
N 
II 


and W the subspace spanned by 


Wi (1, 3, 0, 2, 1), 
W2 (1, 5, —6, 6, 3), 
WwW3 = (2, 5, 3, 2, 1) 


2.7. Bases for subspaces 85 


Find a basis for U + W and for U AW. 


Solution: Note that the matrix A whose row vectors are u,’s is reduced to 
a row-echelon form 


(ee 
0 1 Shoei 
00 00 1 


so that dimU = 3. Similarly, the matrix B whose row vectors are w;’s is 
reduced to a row-echelon form 


13 021 
2 —6 4 2|, 
0 000 


so that dim W = 2. 
Now, if Q denotes the 6 x 5 matrix whose rows are u;’s and w,’s, then 
U+W =R(Q). By the Gaussian elimination, Q becomes: 


1 3 -2 2 3 
0 1 -1 2 -1 
00 10 -!1 
00 00 1 


with the zero rows excluded. Thus, the four nonzero row vectors 
(1, 3, =2, 2, 3), (0, 1, =h 2, AL); (0, 0, 1, 0, =t); (0, 0, 0, 0, 1) 


form a basis for U + W, so that dim(U + W) = 4. 

We now find a basis for UN W. A vector x = (x1, £2, £3, £4, 25) € RŠ 
is contained in UM W if and only if x is contained in both the row space of 
A and that of B. 

Let A be A with x attached at the last row: 


1 3 -2 2 3 
1 4-3 4 2 
2 3 —1 -2 10 


Tı LQ U3 T4 T5 


A= 


Then by the same Gaussian elimination A is reduced to 


1 3 —2 2 3 
0 1 —1 2 —1 
0 0 0 0 1 
0 0 —zı + z2 +£3 40, — 2402+ £4 0 


86 Chapter 2. Vector Spaces 


Therefore, x € R(A) = U if and only if R(A) = R(A). By comparing the 
row vectors of the row-echelon form of A with those of A, one can say that 
x € R(A) if and only if the last row vector of the row-echelon form of A 
is the zero vector, that is, x is a solution of the homogeneous system of 
equations 


0 
0. 


ty + #2 + T3 
4a, — 2x9 + %4 


The same calculation with B gives another homogeneous system of linear 


equations for x: 


9x1 3x9 £3 = 0 
4x1 = 222 + £4 = 0 
24, — x9 +25 = 0. 


Solving these two homogeneous systems together yields 


UNW = {t(1, 4, -3, 4, 2): t E€ R} 


Hence, {(1, 4, —3, 4, 2)} is a basis for U AW and dim(UNW) =1. 


Problem 2.25 Let U and W be the subspaces of the vector space P3(IR) spanned 


"y vule) = 3 =- z + 4e + r’, 
vlr) = 5 + 52? + 2, 
vlz) = 5 = 5a + 102? + 32%, 
and 
wi(z) = 9 — 3r + 3a? + 2r’, 
wz) = 5- r + 2 + r, 
wlz) = 6 + 4? + r’, 


respectively. Find the dimensions and bases for U + W and U N W. 


Problem 2.26 Let 
U = {(x, y, z, u) ER :y+z+u=0}, 
W = {(z, y, 2, u) ER : c+y=0, z= 2u} 


be two subspaces of Rt. Find bases for U, W, U +W, and UN W. 


2.8. Invertibility 87 


2.8 Invertibility 


In Chapter 1, we have seen that a non-square matrix may have only one-side 
(right or left) inverses. In this section, it will be shown that the existence of 
a one-side inverse (right or left) of a matrix A implies the existence or the 
uniqueness of the solutions of a system Ax = b. 


Theorem 2.24 (Existence) Let A be an mxn matriz. Then the following 
statements are equivalent. 


(1) For each b € R”, Ax = b has at least one solution x in R". 
(2) The column vectors of A span R”, i.e., C(A) = R”. 

(3) rank A = m (hence m < n). 

(4) There exists a right inverse B of A such that AB = Im. 


Proof: (1) (2): In general, C(A) C R”. For any b € R”, there is a 
solution x € R” of Ax = b if and only if b is a linear combination of the 
column vectors of A, i.e., b € C(A). Thus R” = C(A). 

(2) <= (3): C(A) = R” if and only if dimC(A) = m < n (see Prob- 
lem 2.12). But dimC(A) = rank A = dim R(A) < min{m, ny}, 

(3) > (4): If rank A = m, then (AA‘)~! exists by Remark (1) below 
Theorem 5.20, so that AT (AAT)! is a right inverse of A. 

(4) > (3): If B is a right inverse of A such that AB = Im, then rank 
AB = rank Im = m so that rank A > m by Corollary 2.20. This means that 
rank A = m. Note that if B is a right inverse of A, then for any b € R”, 
x = Bb is a solution of Ax = b. 


Condition (3) implies that the m rows of A are linearly independent. 
Note that if C(A) G R”, then Ax = b may have no solution for b ¢ C(A). 


Theorem 2.25 (Uniqueness) Let A be an mxn matriz. Then the follow- 
ing statements are equivalent. 


(1) For each b € R”, Ax = b has at most one solution x in R. 
(2) The column vectors of A are linearly independent. 

(3) dimC(A) = rank A =n (hence n < m). 

(4) R(A) =R" =C(A). 

(5) N(4) = {0}. 

(6) There exists a left inverse C of A such that CA = hn. 


88 Chapter 2. Vector Spaces 


Proof: (1) = (2): Note that x = 0 is a solution of Ax = 0. (1) means it 
is the only solution of Ax = 0 which is equivalent to (2) saying the column 
vectors of A are linearly independent. Suppose that (2) holds and x; and 
x2 are two solutions of Ax = b. Then A(x; — x2) = 0 implies xı = xe. 

(2) <= (3): Clear, because all the n column vectors are linearly indepen- 
dent if and only if they form a basis for C(A), or dimC(A) =n < m. 

(3) = (4): Clear, because dim R(A) = rank A = dimC(A) = n if and 
only if R(A) = R” S C(A) (see Problem 2.12). 

(4) = (5): Clear, since dim R(A) + dim N (A) =n. 

(3) = (6): Suppose that rank A = n. Then (ATA)! exists by Theo- 
rem 5.19, so that (AT A)~1A® is a left inverse of A. 

(6) = (3): Let C be a left inverse of A so that CA = Ia. Then rank 
CA = rank In = n implies rank A > n by Corollary 2.20, which means rank 
A=n. 


Remark: (1) We have proved that an mxn matrix A has a right inverse if 
and only if rank A = m, while A has a left inverse if and only if rank A = n. 
Therefore, if m Æ n, A cannot have both left and right inverses. 

(2) Note that if A is a square matrix with m = n, then A has a right 
inverse if and only if m = rank A = n, and if and only if A has a left inverse. 
Moreover, in this case the inverses are the same by Theorem 1.9. Therefore, 
a square matrix A has rank n if and only if A is invertible. This means 
that for a square matrix “Existence = Uniqueness”, and the ten statements 
listed in Theorem 2.24 and 2.25 are all equivalent. In particular, for the 
invertibility of a square matrix it is enough to show the existence of a one- 
side inverse. 


Problem 2.27 For each of the following matrices, discuss the number of possible 
solutions to the system of linear equations Ax = b for any b: 


1 3 -2 5 4 2 3 
()A=|1 4 13 5l, 2)A=| 3 -7 l, 
2 7 —3 6 13 —6 1 
been ee 11 2 
(3) A= 3 8 -7 -2 11 : (4) A= 5 : 3 P 
2 1 -9 —10 —3 


Summarizing all the results obtained so far about solvability of a system, 
one can obtain several characterizations of the invertibility of a square ma- 
trix. The following theorem is a collection of the results proved previously 
in Theorems 1.9, 2.24, and 2.25. 


2.9. Applications 89 


Theorem 2.26 For a square matriz A of order n, the following statements 
are equivalent. 


(1) A has a left inverse. 
(2) Ax =0 has only a trivial solution, i.e., N(A) = {0}. 
(3) A is row equivalent to In. 
(4) The rows of A are linearly independent. 
(5) The rows of A span R’, i.e., R(A) =R". 
(6) All the pivots are non-zero so that the Gauss elimination can be com- 
pleted and PA = LDU, where all the pivots di Æ 0. 
(7) A is a product of elementary matrices. 
(8) A is invertible. 
(9) Ax = b has a solution for every b E R”. 
(10) The columns of A are linearly independent. 
(11) The columns of A span R", i.e., C(A) = R". 
(12) rank A =n. 
(13) A has a right inverse. 


2.9 Applications 


2.9.1 Interpolation I 


In many scientific experiments, a scientist wants to find the precise functional 
relationship between input data and output data. That is, in his experiment, 
he/she puts various input values into the experimental device and obtains 
output values corresponding to those input values. After the experiment, 
what he/she gets is a table of inputs and outputs. The precise functional 
relationship might be very complicated, and sometimes it might be very hard 
or almost impossible to find the precise function. In this case, one thing we 
can do is to find a polynomial whose graph passes through each of the data 
points and comes very close to the function we wanted to find. That is, we 
are looking for a polynomial that approximates the precise function. Such 
a polynomial is called an interpolating polynomial. 

To find such a polynomial, let us begin with a set of given data: Suppose 
that, for n+1 distinct experimental input values xo, £1, ..., Zn, we obtained 
n + 1 output values yo = f(xo), yı = f (£1), ---, Yn = f(n). The output 
values are supposed to be related to the inputs by a certain function f. 


90 Chapter 2. Vector Spaces 


We wish to construct a polynomial p(x) of degree less than or equal to n 
which interpolates f(z) at £o, 41, ..., En: ie, plzi) = yi = f(a) for 
i=0,1,..., n. 

Note that if there is such a polynomial, it must be unique. Indeed, if 
q(x) is another such polynomial, then A(z) = p(x) — q(x) is also a poly- 
nomial of degree less than or equal to n vanishing at n + 1 distinct points 
To, Ti, -+-, Zn. Hence h(x) must be the identically zero polynomial so that 
p(x) = q(x) for all z E€ R. 

Now, the unique polynomial p(x) can be found by solving a system of 
linear equations: If we write p(x) = ao + aiz +--+ anz”, then we are 
supposed to determine the coefficients a;’s. The set of equations 


p(zi) = ao + a1 ai + +++ + ana; = Yi = f (xi), 


fori = 0, 1, ..., n, constitutes a system of n + 1 linear equations in n + 1 
unknowns a;’s: 

l1 zo © z9 ag Yo 

Lt aa s tr ay yı 

l En © Th Gn, Un, 


The coefficient matrix A is a square matrix of order n + 1, known as Van- 
dermonde’s matrix (see Problem 4.9). Since the z;’s are all distinct, it 
is quite easy to see that the rows are linearly independent. That is, A is 
nonsingular. (See Problem 4.9 for another criterion using the determinant: 


det A= || («;--«i) #0.) 
0<i<j<n 


Hence Ax = b always has a unique solution, which determines the unique 
polynomial p(x) of degree < n passing through the given n+1 points (xo, yo), 
Cae yı), Ree (Tr Yn) in the plane R. 


Example 2.23 Given four points 
(0, 3), (1, 0), (1, 2), (3, 6) 


in the plane R’, let p(x) = ao +a, 2 + agx” +a32° be the polynomial passing 
through the given four points. Then, we have a system of equations 


ag =. 3 
ao + a + @ + az = 0 
ago — Qai T @a2? — @3 = 2 
ago T 3aı T 9a2 TT 27a3 = 6 


2.9. Application: Interpolation 91 


Solving this system, one can get ag = 3, a, = —2, a2 = —2, a3 = 1, and the 
unique polynomial is p(x) = 3 — 2x — 22? + 2°. 


Problem 2.28 Let f(x) =sinz. Then at z =0, 4, Z, *4, q, the values of f are 
von Sey m 0. Find the polynomial p(x) of degree < 4 that passes through 


these five points. (One may need to use a computer due to messy computation). 


y =0, 


Problem 2.29 Find a polynomial p(x) = a + bz + cz? + dz? that satisfies 
p(0) = 1, p'(0) = 2, p(1) = 4, p'(1) = 4. 


Problem 2.30 Find the equation of a circle that passes through the three points 
(2, —2), (3, 5), and (—4, 6) in the plane R’. 


Remark: Note again that the interpolating polynomial p(x) of degree < n 
is uniquely determined when we are given precisely n+ 1 values of y at n+1 
distinct points £o, £1, ..., Tn- 

However, if we are given fewer data, then the polynomial is under- 
determined: i.e., if we have m values of y at m distinct points £1, £2, ..., Lm 
for m < n+ 1, then there are as many interpolating polynomials as the null 
space of A since in this case A is an m x (n+ 1) matrix withm <n +1. 

On the other hand, if we are given more than n + 1 data, then the poly- 
nomial is over-determined: i.e., if we have m values of y at m distinct points 
T1, 2, .--, Lm for m > n + 1, then there need not be any interpolating 
polynomial since the system could be inconsistent. In this case, the best one 
can do is to find a polynomial of degree < n to which the data is closest. 
This case will be discussed again in Section 5.7. 


2.9.2 Interpolation IT 


There is another kind of interpolating polynomials, called Lagrange poly- 
nomials, which form a basis for the space of polynomials. 
Let Ao, à1,..., Àn be distinct numbers in R. Define, for x € R, 


where (z — Xi) is a deleted term. 
Then fi, i = 0,1,...,, are n+ 1 polynomials of degree n, called the 
Lagrange polynomials associated with ,’s. 


Theorem 2.27 a = { fo, fi,.--,fn} form a basis for P,,(R). 


92 Chapter 2. Vector Spaces 


Proof: Since dim P,,(R) = n + 1, it is good enough to show a is linearly 
independent. Suppose that X`;—o cif; = 0 for some scalar c;’s. Then since 


a ifi j 


aan oe 


’ 


we have 0 = Jo ci filà) = cj, for j = 0,1,...,n. 


Therefore, any polynomial p € P,,(IR) can be written as 
n 
p=) pi) fi 
i=0 


since, if p = o bi fi, then p(Aj) = Oyo bifi(àj) = bj. This expression of 
p is called Lagrange interpolation formula. This also shows that given 
data 

(Ao, bo), (à1,b1), --., (Ans bn), 


we can construct the Lagrange polynomial f; s as above. Then 


n 
pe So u 
i=0 
is the unique polynomial of degree n that has the given value bj at Aj. 
Example 2.24 From Example 2.23, given four points 
(0, 3), (1, 0), (—1, 2), (3, 6) 


in the plane RÊ, construct f;’s: 


filz) = TE) 
fo(x) = - (a? — 24? — 32) 
hle) = —Z(0® — 40? — 32) 
fale) = e-a). 


Thus 


p(x) = 3f\(x) + Ofo(x) + 2f3(x) + 6f3(x) = a? — 22? — Qe + 3. 


2.9. Application: Interpolation 93 


2.9.3 The Wronskian 


Consider the vector space V = D(R) of all functions which are differen- 
tiable infinitely many times on R. Then it is known that V is an infinite 
dimensional vector space. 

The linear dependence of the vectors in V is given as follows: A set {fi, 
fo, .--, fn} of n functions in V is linearly dependent in V if 


ci fi + cofe+-+::+enfn = 0 


holds for some constants c,;’s, not all of them are zero. Note that the zero 
function 0 takes its value zero at all the points in the domain. Thus they 
are linearly independent if, for all « € R, 


cı filz) + cofe(x) ++- + en fn(x) = 0 
implies that all c; = 0. 


Example 2.25 Consider the set of functions {g(z) = 2|z|, h(x) = 27} on 
R. On (—oo, 0], the set is {g(x) = 2|z| = —x?, h(x) = x7} so that lg+1h= 
0, while on [0, 00), {g(z) = 2|2| = x”, h(x) = 27} so that 1g—1h = 0. Thus, 
these two functions g and h are linearly dependent on each of (—oo, 0] and 
[0, co). However, they are clearly linear independent functions on R, since 
there is no pair of constants a, b such that ag(t) + bh(t) = 0 for allt € R. 


Consider the linear dependence of n functions f;’s in V: 
cı fı teofet-::-+enfn = 0. 
By taking differentiations n — 1 times, one obtain n equations: 
cı f® (x) + co fs? (2) tee ton fO(c) =0, O<i<n-1, 


for all x € R. Or, in a matrix form: 


fi(2) fala) > fala) c 0 
fi(x) faz) ss f(z) C2 0 
A(z)c = ; . WE 
Ae) A TG e ET | Lew 0 
Hence, ¢ = (ci,...,¢n,) E€ R is a solution of this system, and { fj), fo, ..., 


fn} is linearly dependent if one vector c 4 0 € R works as a solution of this 
system for all x € R, or is linearly independent if this system has only the 


94 Chapter 2. Vector Spaces 


trivial solution c = 0 at some z, which is true if the square matrix A(x) is 
invertible at such a point z € R. 

In Chapter 4, we will define the determinant of a matrix and show that 
a matrix A is invertible if and only if the determinant of A(x) is not zero. 
The determinant of the coefficient matrix A(z) is called the Wronskian for 
{fi(x), fo(x),..-, fn(a)} and denoted by W (x). If there is a point zo € R 
such that W (zo) # 0, then the coefficient matrix A(z) is nonsingular at 
x = xo, and so all c; = 0. Therefore, if the Wronskian W (x) is nonzero at 
least at one point x in R, then {f1, fo, ..., fr} are linearly independent. 

However, the Wronskian W (x) = 0 for all z does not imply linear depen- 
dence of the given functions. In fact, W (x) = 0 means that the functions are 
linear dependent at each point x € R, but the constants c;’s giving nontrivial 
linear dependence may vary as x varies in the domain. (See Example 2.25.) 


Example 2.26 (1) The set of functions given in Example 2.25 are linearly 
independent on R. But the Wronskian for them is 


z zle] z ]_ 
W (x) = det | Qe) 2z | =0. 


(2) For the sets of functions F, = {xz,cosx,sinxz} and Fy = {x,e",e~*}, 
the Wronskians are 


z cos £ singz | 
Wi(z) =det | 1 —sing cosx | =z 
0 —cosx —sinz | 
and 
r e e`? 


Wə(x)=det | 1 e —e™ | = 2r. 
0 e e* 


Since W;(z) #0 for x £0, both F and fə are linearly independent. 


Problem 2.31 Show that 1,z,27,---,2” are linearly independent in the vector 
space C(R) of continuous functions. 


2.10 Exercises 


2.1. Let V be the set of all pairs (a, y) of real numbers. Define 
(z, y) + (z1, y1) = (+z, y +y) 
e(z, y) = (cz, y). 
Is V a vector space with these operations? 


2.10. Exercises 95 


2.2. For x, y € R” and k € R, define two operations as 
x@®y=x-y, k: x = —kx. 


The operations on the right sides are the usual ones. Which of the rules in 
the definition of a vector space are satisfied for (R", ©, -)? 


2.3. Determine whether the given set is a vector space with the usual addition 
and scalar multiplication of functions. 
(1) The set of all functions f defined on the interval [—1, 1] such that 
f(0) = 0. 
(2) The set of all functions f defined on R such that limg_,.. f(x) = 0. 
(3) The set of all twice differentiable functions f defined on R such that 
f" (x) + f(x) =0 


2.4. Let C?[—1, 1] be the vector space of all functions with continuous second 
derivatives on the domain [—1, 1]. Which of the following subsets is a sub- 
space of C?[—1, 1]? 

(1) W ={f(x) € C[-1, 1]: f"(a)+ f(x) = eit 
(2) W = {f(x) € C1, 1]: f"(@) + f(z) =, -1< <1}. 


2.5. Which of the following subsets of C[—1, 1] is a subspace of the vector space 
C[-1, 1] of continuous functions on [—1, 1]? 


(1) W = {f (z) €C[-1, 1): f(-1) = -f (1)}- 

(2) W = {f (x) € C|- 1. 1]: f(z) > 0 for all x in [-1, 1]}. 
(3) W = {f (z) €C[-1, 1): f(-1) = -2 and f(1) = 2}. 
(4) W = {f(x) € C[- 1 1]: f$) = 0}. 


2.6. Does the vector (3, —1, 0, —1) belong to the subspace of Rt spanned by the 
vectors (2, —1, 3, 2), (-1, 1, 1, —3) and (1, 1, 9, —5)? 


2.7. Express the given function as a linear combination of functions in the given 


set Q. 


(1) p(x) = —1 — 32 + 32? and Q = {pi (x), po(x), p3(x)}, where 
pi(z) =14+ 22 +27, po(x) =2+52, ac ee ape 

(2) p(x) = —2 — 4z + 2? and Q = {pı (£), po(x), p3(x), pa(x)}, where 
pi(z) = 14+ 22? +23, po(z) =1+a24 22°, p3(x) = —1 — 32 — 42°, 
palz) = 1 +22- r? + 2°. 


2.8. Is {cos? x, sin? z, 1, e”} linearly independent in the vector space C (R)? 
2.9. Show that the given sets of functions are linearly independent in the vector 
space C[—7, r]. 
(1) {1, z, ae? cane BT. 
(2) hee e7} 


(3) {1, sing, cosg, ..., sin kg, cos kg} 


96 


2.12. 


Chapter 2. Vector Spaces 


. Are the vectors 


Vie (1, 1, 2, 4), V2 = (2, =al —5, 2), 
v3 >= (1, =l; —4, 0), VAS (2, 1, 1, 6) 


linearly independent in the 4-space Rt? 


. In the 3-space R’, let W be the set of all vectors (£1, £2, £3) that satisfy 


the equation zı — z2 — z3 = 0. Prove that W is a subspace of R3. Find a 
basis for the subspace W. 


Let W be the subspace of C[—7, 7] consisting of functions of the form f(x) = 
asing + bcosx. Determine the dimension of W. 


. Let V denote the set of all infinite sequences of real numbers: 


V = {x = {ai}, : a ER} 
If x = {a;} and y = {y;} are in V, then x + y is the sequence {x; + y;}%). 
If c is a real number, then cx is the sequence {cx;}%°). 


(1) Prove that V is a vector space. 
(2) Prove that V is not finite dimensional. 


. For two matrices A and B for which AB can be defined, prove the following 


statements: 

(1) If both A and B have linearly independent column vectors, then the 
column vectors of AB are also linearly independent. 

(2) If both A and B have linearly independent row vectors, then the row 
vectors of AB are also linearly independent. 

(3) If the column vectors of B are linearly dependent, then the column 
vectors of AB are also linearly dependent. 

(4) If the row vectors of A are linearly dependent, then the row vectors of 
AB are also linearly dependent. 


. Let U = {(a, y, z) : 2a +3yt+z2=0} and V = {(a2, y, z) : ©+2y—z2=0} 


be subspaces of R? . 
(1) Find a basis for U A V. 
(2) Determine the dimension of U + V. 
(3) Describe U, V, UNV and U + V geometrically. 


. How many 5 x 5 permutation matrices are there? Are they linearly indepen- 


dent? Do they span the vector space Ms y5(R) ? 


. Find bases for the row space, the column space, and the null space for each 


of the following matrices. 


2.10. 


2.20. 


2.21. 


Exercises 97 
1 2 1 5 0 2 1 —5 
()A=]2 4 -3 0|, @Q)B=] 1 1 -2 Deis 
1 2 -1 1 1 5 0 0 
0 1 -1 -2 1 
1 3 2 1 1 —i1 3 1 
(383)C=|2 6 4], (4)D=}]2 1 -1 8 38 
3.9 6 0 0 -2 2 1 
3 5 —5 5 10 
2 2 -6 8 
. Find the rank of A as a function of z: A= | 3 3 -9 8 
1 1 g 4 
. Find the rank and the largest invertible submatrix of each of the following 
matrices. 
0 0 0 1 Te 3 1 
1 1 0 1 
001 0 | 1 4 0 1 
0000 1 000 


For any nonzero column vectors u and v, show that the matrix A = uv” has 
rank 1. Conversely, every matrix of rank 1 can be written as uv’ for some 
vectors u and v. 


Determine whether the following statements are true or false, and justify 
your answers. 


(1) 
(2) 
(3) 


The set of all n x n matrices A such that AT = A7! is a subspace of 
the vector space Mnxn(R). 

If a and £ are linearly independent subsets of a vector space V, then 
so is their union aU £. 

If U and W are subspaces of a vector space V with bases a and 8 
respectively, then the intersection aM 8 is a basis for UN W. 

Let U be the row-echelon form of a square matrix A. If the first r 
columns of U are linearly independent, then so are the first r columns 
of A. 

Any two row-equivalent matrices have the same column space. 

Let A be an m x n matrix with rank m. Then the column vectors of A 
span R”. 

Let A be an m x n matrix with rank n. Then Ax = b has at most one 
solution. 

If U is a subspace of V and x, y are vectors in V such that x + y is 
contained in U, then x € U andy EU. 

Let U and V are vector spaces. Then U is a subspace of V if and only 
if dimU < dim V. 

For any m x n matrix A, dim C(A) + dim M(AT) =m. 


Chapter 3 


Linear Transformations 


3.1 Basic properties of linear transformations 


An m x n matrix A transforms a vector x in R” to a vector b in R” viaa 
matrix multiplication Ax = b, which satisfies an arithmetic rule A(x+ky) = 
Ax+kAy for any x, y € R” and a scalar k € R. This kind of transformation, 
called a linear transformation, plays a special role in mathematics and is 
mostly used to classify vector spaces. 


Definition 3.1 Let V and W be vector spaces. A transformation T : V — 
W is called a linear transformation from V to W if for all x, y € V and 
scalar k € R the following conditions hold: 


(1) T(x+y) =T(x) +T(y), 
(2) T(kx) = kT (x). 


We often call T simply linear. Note that the two conditions for a linear 
transformation can be combined into a single requirement: 


T(x + ky) = T(x) +kT(y), 


by which we mean that T preserves the algebraic operations: the vector 
addition and the scalar multiplication. Geometrically, this linearity means 
that T transforms a straight line into a straight line, since x + ky for k € R 
represents a straight line through x in the direction y in V and its image 
T(x + ky) also represents a straight line T(x) + kT(y) through T(x) in the 
direction of T(y) in W. 


99 


100 Chapter 3. Linear Transformations 


Example 3.1 (1) For a vector space V, the identity transformation 
Id: V — V is defined by Id(x) = x for all x € V. If W is another vector 
space, the zero transformation Ty : V — W is defined by To(x) = 0 (the 
zero vector) for all x € V. Clearly, both transformations are linear. 

(2) In calculus, the differentiation and integration of functions are well 
studied: In fact, let D(R) be the set of differentiable functions, which clearly 
becomes a subspace of the space C(R) of the continuous functions on R. Then 
it is proven there that the differentiation and the integration 


D:D(R) > D(R), Z:D(R) > D(R), given by 


D(f) =f"(«), TUE) = [ ` f(t)dt, for x €R, 


satisfy linearity. Thus they are linear transformations. Many problems 
involving derivatives and integrations may be reformulated in terms of linear 


transformations. 
(3) The transformation tr : Mnxn(R) > R defined as the sum of diagonal 
entries n 
tr(A) = Q11 + a22 + °- + ünn = are 
i=1 


for A = [aij] E€ Mnxn(R), is called the trace. It is easy to show that 
tr(A + B) = tr(A) + tr(B) and tr(kA) =k tr(A) 


for any two matrices A and B in Mnxn(R), which means that ‘tr’ is a linear 
transformation from Mnxn(R) to the 1-space R. In particular, one can easily 
show that the set of all nxn matrices with trace 0 is a subspace of the vector 
space Mnxn(R). 


Problem 3.1 Let W = {A € Mnxn(R) : tr(A) = 0}. Show that W is a subspace, 
and find a basis for W. 


Problem 3.2 Show that, for any matrices A and B in Mnxn(R), tr(AB) = tr(BA). 


Example 3.2 (4) A linear transformation on R? defined by the matrix 


cos@ —sind | 


MS | sin 0 cos @ 


is called a rotation by the angle 0, where 0 measures the angle between the 
x-axis and a fixed vector in R°. 


3.1. Basic properties of linear transformation 101 


(5) The projection on the x-axis is the linear transformation P : R? > 
R? defined by, for x = (x,y) € RÊ, 


rai JJ- E] 
ros-[3 Jil- 


is the reflection about the x-axis. 


Figure 3.1: Three linear transformations on R? 


Problem 3.3 Find the matrix of the reflection about the line y = g in R. 


Example 3.3 Consider the following functions: 
(1) f : R > R defined by f(x) = 2x; 
(2) g : R —> R defined by g(x) = 2? — z; 
(3) h: R > R defined by h(x, y) = (x — y, 22); 
(4) k: R? —> R? defined by k(x, y) = (xy, 2? + 1); 
(5) L: R > R defined by L(x, y) =y — zx. 
One can easily see that g and k are not linear, while f, h and £ are linear. 
Moreover, all polynomials of degree greater than one on R are not linear. 


The following is a direct consequence of the definition, and the proof is 
an easy exercise. 


Theorem 3.1 Let T:V — W be a linear transformation. Then 
(1) T(0)=0. 
(2) For any x1, X2, ..., Xn E V and scalars ky, ko, ..., kn, 


T(kix1 + kox2 + +++ + knXn) = ki T (x1) + koT (x2) + +++ + kn T (Xn). 


102 Chapter 3. Linear Transformations 


Definition 3.2 Let V and W be two vector spaces, and let T : V — W be 
a linear transformation from V into W. 

(1) Ker(T) ={v © V: T(v) =0} CV is called the kernel of T. 

(2) Im(T) = {T(v) ©W:veEV}=T(V) CW is called the image of T. 


Example 3.4 Let V and W be vector spaces and let Id : V —> V and 
To : V — W be the identity and the zero transformations, respectively. 
Then it is easy to see that Ker(Id) = {0}, Im(Jd) = V, Ker(To) = V, and 
Im(Tp) = {0}. 


Theorem 3.2 Let T: V — W be a linear transformation from a vector 
space V to a vector space W. Then the kernel Ker(T) and the image Im(T) 
are subspaces of V and W, respectively. 


Proof: Since T(0) = 0, each of Ker(T) and Im(T) is nonempty having 0. 
(1) For any x, y € Ker(T) and for any scalar k, 
T(x + ky) = T(x) +kT(y) = 0 + k0 = 0. 


Hence x + ky € Ker(T), so that Ker(T) is a subspace of V. 
(2) If v, w € Im(T), then there exist x and y in V such that T(x) = v 
and T(y) = w. Thus, for any scalar k, 


v + kw =T(x)+kT(y) =T(x+ ky). 


Thus v + kw € Im(T), so that Im(T) is a subspace of W. 


Example 3.5 Let A: R” — R” be the linear transformation defined by an 
m x n matrix A as in Example 3.1 (1). The kernel Ker(A) of A consists of 
all solutions of the homogeneous system Ax = 0. Hence, Ker(A) is nothing 
but the null space N (A) of the matrix A, and Im(A) = {Ax : xEeR} = 
C(A) C R”. 


One of the most important properties of linear transformations is that 
they are completely determined by their values on a basis. 


Theorem 3.3 Let V and W be vector spaces. Let {vi, V2, ..., Vn} be a 
basis for V and let Wi, W2, ..., Wn be any vectors (possibly repeated) in 
W. Then there exists a unique linear transformation T : V — W such that 
T(vi) = wi fori = 1, 2, ..., Nn. 


3.1. Basic properties of linear transformation 103 


Proof: Let x € V. Then it has a unique expression: x = )~\__, ajvi for 
some scalars a1, d2, ..., Gp. Define 
n 
T:V>W by T(x)= X awi. 
i=1 
In particular, T(v;) = w; for i = 1, 2, ..., n. 
Linearity: For x = Si, avi, y = y; bivi E€ V and k a scalar, we 
have x + ky = J; (a; + kbi)vi. Then 
n n n 
T(x + ky) = X (ai + kbi)w; = Se aiWi +k `> biw; = T(x) + kT (y). 


Uniqueness: Suppose that S : V — W is linear and S(v;) = w; for 


i= 1, 2, ..., n. Then for any x € V with x = Da aivi, we have 
n n 
SoS a,S(vi) = y a = T(x). 
i=1 i=1 


Hence, we have S =T. 


Therefore, from an assignment T(v;) = w; of an arbitrary vector in W 
to each vector v; in a basis for V, one can extend it uniquely to a linear 
transformation T from a vector space V into W. The uniqueness in Theorem 
3.3 may be rephrased as the following corollary. 


Corollary 3.4 Let V and W be vector spaces, and let {v1, V2, ..., Vn} bea 
basis for V. If S, T: V —> W are linear transformations and S (v;i) = T (vi) 
fori=1, 2, ..., n, then S =T, i.e., S(x) = T(x) for alx EV. 


2 


Example 3.6 Let wı = | ; | , W = | = 


iF w3 = {i be three vec- 


tors in R?. 
(1) Let œ = {e1, e2, e3} be the standard basis for the 3-space R?, and 
let T : R? — R? be the linear transformation defined by 


T(e1) = W], T (e2) = W2, T (e3) = W3. 


£1 2 
Find a formula for T | xə |, and then use it to compute T | —3 
T3 5 


104 Chapter 3. Linear Transformations 


(2) Let 8 = {v1, v2, v3} be another basis for R?, where v; = | 1 |, 
1 
ne ick 
v2 =] 1 |, v3= | 0 |, and let T : R? > R? be the linear transformation 
of Lo] 
defined by 


£1 2 | 
Find a formula for T | x2 j, and then use it to compute T | —3 |. 
T3 | 5 | 


Solution: (1) For x = (x1, £2, £3) = £1€1 + T2€2 + z3e3 € R, 
3 3 
T(x) = Jalle =). ay 
i=l i=l 


= a| |+| ills] 


w al, 2 l n 


— g2 + 3x3 0 -1 3 a 


This means that T = | ; ka 3 | In particular, 
2 2 
1 2 4 16 
5 0 -1 3 5 18 


(2) We first express x = (x1, £2, £3) as a linear combination of vi, vo, v3: 


Tı 3 1 1 
ry | = >D kivi = kı| 1 | +ko| 1 | +k | 0 
T3 i=1 0 0 
kı + k2 + k3 
= | ki +k 


kı 


3.2. Invertible linear transformations 105 


Thus we obtain a system of equations 


ky + ko + kg = wy 
ky ap ky = T2 
kı = £3. 


The solution is kı = z3, kg = £2 — £3, k3 = £1 — £2. This means that 


(1, £2, £3) = T3V1 + (z2 — £3)v2 + (zı — z2)v3, and 
T(z1, £2,£3) = z3T(v1)+ (z2 — z3)T (v2) + (zı — z2)T (v3) 
= n| 9 | +e- i | +e- 


pemn E = a 


= 321 — 4T2 + T3 3 —4 1 ee 
T3 
: 4 —2 -1 ; 
That is, T = | 3 4 1 | In particular, 


PE Ae 


Problem 3.4 Is there a linear transformation T : R? — R? such that T(3, 1, 0) = 
(1, 1) and T(—6, —2, 0) = (2, 1)? If yes, can you find an expression of T(x) for 
x = (21, £2, 73) in R? 


Problem 3.5 Let V and W be vector spaces and T : V + W be linear. Let 
{wi, W2, ..., wx} bea linearly independent subset of the image Im(T) C W. Sup- 
pose that a = {v1, vo, ..., Vg} is chosen so that T(v;) = w; for i = 1, 2, ..., k. 
Prove that a is linearly independent. 


3.2 Invertible linear transformations 


A transformation f from a set X to a set Y is said to be invertible if there 
is a transformation g from Y to X such that their compositions satisfy 
go f = Id and fog = Id. Such a transformation g is called the inverse 
transformation of f and denoted by g = f~'. One can notice that if there 


106 Chapter 3. Linear Transformations 


exists an invertible transformation from a set X into another set Y, then it 
gives a one-to-one correspondence between these two sets so that they can 
be identified as sets. A useful criterion for a transformation between two 
given sets to be invertible is that it is one-to-one and onto. Recall that a 
transformation f : X — Y is one-to-one (or injective) if f(u) = f(v) in 
Y implies u = v in X, and is onto (or surjective) if for each element y 
in Y there exists an element z in X such that f(a) = y. A transformation 
is said to be bijective if it is both one-to-one and onto, that is, if for each 
element y in Y there is a unique element x in X such that f(x) = y. 


Lemma 3.5 A transformation f : X — Y is invertible if and only if it is 
bijective (or one-to-one and onto). 


Proof: Suppose f : X — Y is invertible, and let g : Y — X be its inverse. 
If f(u) = f(v), then u = g(f(u)) = g(f(v)) = v. Thus f is one-to-one. For 
each y € Y, let g(y) =x in X. Then f(x) = f(g(y)) = y. Thus it is onto. 
Conversely, suppose f is bijective. Then, for each y € Y, there is unique 
x E€ X such that f(x) = y. Now for each y € Y define g(y) = x. Then one 
can easily check that g : Y —> X is well-defined, and that f og = Id and 
go f = Id, i.e., g is the inverse transformation of f. 


ÉET:V => W and S: W — Z are linear transformations, then their 
composition (S o T)(v) = S(T(v)) is also a linear transformation. In par- 
ticular, if two linear transformations are defined by matrices A : R” —> R” 
and B : R” — R* as in Example 3.1 (1), then their composition is just the 
matrix product BA of them, i.e., (B o A)(x) = B(Ax) = (BA)x. 

The following lemma shows that if a given transformation is an invertible 
linear transformation from a vector space into another, then the linearity is 
preserved by the inversion. 


Lemma 3.6 Let V and W be vector spaces. IFT : V — W is an invertible 
linear transformation, then its inverse T7} : W — V is also linear. 


Proof: Let w 1, w2 € W, and let k be any scalar. Since T is invertible, 
it is one-to-one and onto, so there exist unique vectors vı and v2 in V such 
that T(v,) = w, and T(v2) = w2. Then 
T '(w, +kwo) = T7!(T(v1) + kT(v2)) 
= T! (T(vı + kvə)) 
vit+kve 
= T (wi) +kT7' (wo). 


3.2. Invertible linear transformations 107 


Definition 3.3 A linear transformation T : V — W from a vector space V 
to another W is called an isomorphism if it is invertible (or one-to-one and 
onto). In this case, we say that V and W are isomorphic to each other. 


Example 3.7 Consider the vector space P2(R) = {at+br+ca? : a,b,c € R} 
of all polynomials of degree < 2 with real coefficients. If, to each polynomial 
a+ bg + ca? in the space P,(R), we assign the column vector [a b c] in 
R?, then one can easily show that it is an isomorphism from P2(R) to R. 
Thus one can identify the polynomial a + bx + cx? with the column vector 
[a bc]", that is, these two vector spaces can be considered as the same vector 
space. In this sense, two isomorphic vector spaces may be considered to be 
the same space. In general, the vector space P,(IR) is identified with the 
(n + 1)-space Re+, 


Lemma 3.6 shows that, if T is an isomorphism, then its inverse TT! is also 
an isomorphism with (T~')~! = T. In particular, if a linear transformation 
A:R — R is defined by an invertible matrix A, then the inverse matrix 
A`! plays the inverse linear transformation, so that it is also an isomorphism 
on R”. That is, a linear transformation A : R” — R” defined by ann xn 
square matrix A is an isomorphism if and only if A is invertible, that is, 
rank A =n. 


Problem 3.6 Suppose that S and T are linear transformations whose composition 
S o T is well-defined. Prove that 

(1) if S o T is one-to-one, so is T, 

(2) if S oT is onto, so is S, 

(3) if S and T are isomorphisms, then so is SoT, 

(4) if A and B are two n x n matrices of rank n, then so is AB. 


Problem 3.7 Let T : V > W be a linear transformation. Prove that 
(1) T is one-to-one if and only if Ker(T) = {0}. 
(2) fV = W, then T is one-to-one if and only if T is onto. 


Theorem 3.7 Two vector spaces V and W are isomorphic if and only if 
dim V = dim W. 


Proof: Let T : V => W be an isomorphism, and let {v1, va, ..., Vn} be 
a basis for V. Then we show that the set {T (v1), T(vo), ..., T(vn)} isa 
basis for W so that dim W = n = dim V. 

(1) It is linearly independent: Since T is one-to-one, the equation 


0 = aT (vi) + eT (v2) + < +onT (vn) = T(V + cove + +++ +CnVn) 


108 Chapter 3. Linear Transformations 


implies that O0 = cyv, + C&2V2 + +++ + CnVn. Since the v,’s are linearly 
independent, we have c; = 0 for all i = 1, 2, ..., n. 

(2) It spans W: Since T is onto, for any y € W there exists an x € V 
such that T(x) = y. Write x = };—; avi. Then 


y = T(x) = T (aivi +a2V2+ +++ +anVn) = aT (v1 )+a2T (ve)+---+anT (vn), 


i.e., y is a linear combination of T(v,), T(v2) ..., T(vn). 

Conversely, suppose that dimV = dimW = n. Then one can choose 
bases {v1, Vo, ..., Vn} and {w1, W2, ..., Wn} for V and W, respectively. 
By Theorem 3.3, there exist linear transformations T : V — W and S : 
W —> V such that T(v;) = w; and S(w;) = v; fori = 1, 2, ..., n. Clearly, 
(SoT)(v;) = vi and (T o S)(w;) = w; for i = 1, 2, --- , n, which means 
that S = T~!, and so T is an isomorphism. 


Corollary 3.8 Let V and W be vector spaces. 
(1) If dimV =n, then V is isomorphic to the n-space R". 
(2) If dimV = dimW, any bijective transformation from a basis for V 
to a basis for W can be extended to an isomorphism from V to W. 


The identification between a vector space V and R” in Corollary 3.8 
depends on the choice of an isomorphism between two spaces and an iso- 
morphism also depends on the choices of bases for two spaces as shown in 
Theorem 3.7. However, an isomorphism is uniquely determined if we fix the 
bases, in which the order of the vectors is also fixed. 

By an ordered basis for a vector space we mean a basis endowed 
with a specific order among the basis vectors. For example, two the bases 
{e1, e2,e3} and {e3, e2,e,} in the 3-space R? are different as ordered bases 
even if they are the same basis as unordered ones. Likewise, a permutation 
of vectors in a basis {v1, V2, ..., Vn} in an n-dimensional vector space V 
gives rise a different basis for V. In the following, all the bases are ordered 
bases unless otherwise stated. 

Let V be a vector space of dimension n with a basis a = {v1, v2, ..., Vn}, 
and let 8 = {e1, e2, ..., En} be the standard basis for R”. Then the iso- 
morphism ® : V —> R” defined by ®(v;) = e; is called the natural isomor- 
phism with respect to the basis a. By this isomorphism, a vector in V can 
be identified with a column vector in R”. In fact, for any x = )%""_, aivi E€ V, 
the image of x under this natural isomorphism is written by 


3.2. Invertible linear transformations 109 


which is called the coordinate vector of x with respect to the basis a, and 
it is denoted by [x]q. Clearly [vila = €i. 


Example 3.8 (1) Recall that, from Example 3.2, the rotation by the angle 
6 of R? is given by the matrix 


Ay | cos@ —siné | 
sind cos |` 
Clearly, it is invertible and hence is an isomorphism of R?. In fact, one can 
easily check that the inverse Rg l is another rotation R_g. 
(2) Let a = {e1,e2} be the standard basis, and let 6 = {v1, v2}, where 
vi = Roe;, i = 1,2. Then £ is also a basis for R?. The coordinate vectors of 


vi with respect to a are themselves 


sin ð cos 6 
while 
1 0 
tile =| o| vasf] 
If we choose a’ = {e9,e;} as a different ordered basis for R’, then the 


coordinate vectors of v; with respect to a’ are 


[vile = | sin 0 | eee | cos 6 | 


cos 0 —sin@ 


Example 3.9 In the plane R’, the reflection about the line y = x can 
be obtained by the compositions of the rotation by —7 of the plane, the 
reflection about the z-axis, and the rotation by 4. Actually, it is a product 
of matrices given in (1) and (3) of Example 3.2 with 6 = 4: Note that the 


rotation by 4 is 


ese Sls i 
4 f) 
2 


os T JE. 
Sın T cos T 


and the reflection about the x-axis is | ; E | and R-z = Rz". Hence, 
=g 


the matrix for the reflection about the line y 


In general, the reflection about a line £ in the plane can be expressed as 
the composition Rg oT’ o R_g, where T is the reflection about the x-axis and 
0 is the angle between the z-axis and the line £ (see Figure 3.2). 


110 Chapter 3. Linear Transformations 


Figure 3.2: The reflection Rg o T o R_¢ 


Problem 3.8 Find the matrix of reflection about the line y = V3z in R. 


Problem 3.9 Find the coordinate vector of 5 + 2g + 3z? with respect to the given 
ordered basis a for P2(R): 
(1) a= {1, x, z°}; (2) a={1 +z, 1+2, +2}. 


3.3 Projections 


In this section, we introduce a special kind of linear transformation, called 
projections, that will play important roles in Chapter 5. 

Let U be a subspace of a vector space V. Then by Corollary 2.13 there 
is a complementary subspace W of U in V such that V = U @®W. Thus 
any vector x € V has a unique expression as x = u + w for u € U and 
w € W. As an easy exercise, one can show that a function T : V > V 
defined by T(x) = T(u + w) = u is a linear transformation, whose image 
Im(T) = T(V) is the subspace U and kernel Ker(T) is the subspace W. 


Definition 3.4 Let T : V — V be a linear transformation with the image 
space U C V. If there is a subspace W in V such that V = U @W and 
T(x) = u for x= u +w € U OW, then T is called a projection of V onto 
U along W. 


Notice that Im(T) = U and Ker(T) = W. For a given subspace U of 
V, there could be many projections onto U depending on the choice of the 
complementary subspaces of U in V. However, for a fixed complementary 
subspace W of U, the projection T onto U along W = Ker(T) is uniquely 
determined (see Example 3.10). One can envisage that a projection T plays 


3.3. Projections 111 


the role of sunlight, a complementary subspace W is the direction along 
which the light shines, the image under T is the shadow of an object, and 
U is the screen. 


Example 3.10 Let us consider the following 1-dimensional subspaces of 
the Euclidean 2-space R?: 


U 
WwW {re2 : r € R} = y-axis 
V = {r(e +e2): re R}. 


{rey : r € R} = z-axis 


It is not hard to see that R? = USW = UV. Thus a vector x = (2,1) € R? 
may be written in two ways: 


1) € UW =R, 
1) €e USV =R. 


Figure 3.3: Two decompositions of R? 


Let Ty and Sy denote the projections of R? onto U along W and V, 
respectively. Then 
Ty (x) = 2(1,0)€ U, Tw(x)= (0,1)€ W, and 
Su(x) = (1,0) €U, Sy (x) = (1, 1) EV. 


This shows that a projection of R? onto the subspace U (= the z-axis) 
depends on a choice of complementary subspace of U. 


Notice that, if T : V — V is a projection onto U, then, by definition, 
T(u) = T(u + 0) = u for any u € U and for any choice W among the 
complementary subspaces. Thus, T o T(x) = T(T(x)) = T(u) = u = T(x) 
for any x=u+w€V, ie.. ToT =T for any projection T of V. 


112 Chapter 3. Linear Transformations 


Theorem 3.9 A linear transformation T : V — V is a projection onto a 
subspace U if and only if T = T? (=T oT by definition). 


Proof: We have already seen that T o T = T for any projection T. 
Conversely, suppose T? = T. We show that V = Im(T) @ Ker(T) and 
T(u +w) =u for any u + w € Im(T) 6 Ker(T) = V. 
For the first one, we prove Im(T) N Ker(T) = {0} and V = Im(T) + 
Ker(T). Indeed, if u € Im(T) N Ker(T), then there exists x € V such that 
T(x) = u and T(u) = 0. Now 


u = T(x) = T?” (x) = T(T(x)) = T(u) = 0 


proves Im(T) N Ker(T) = {0}. Note that this also shows T (u) = u for u € 
Im(T). On the other hand, for any x € V, x— T(x) € Ker(T) since T? =T. 
Now, x = T(x) + (x —T(x)) € Im(T) + Ker(T) shows V = Im(T) + Ker(T). 

For the second one, note that T(u + w) =T(u) + T(w) = T(u) = u for 
any u + w € Im(T) $ Ker(T). 


Thus if T : V > V is a projection of V, then V = Im(T)  Ker(T). It is 
not difficult to show that Im(Idy —T) = Ker(T) and Ker(Idy —T) = Im(T), 
where Idy is the identity transformation on V. 


Corollary 3.10 A linear transformation T : V > V is a projection of V 
onto a subspace U along W if and only if Idy — T is the projection of V 
onto W along U. 


Proof: It is enough to show that (Idy — T) o (Idy — T) = Idy — T. But 
(paola Tisas eere St 


Note that (Idy — T) o T = T — T? = 0, the zero transformation. In 
Section 5.5, we discuss more about projections. 


Problem 3.10 for V = U ẹ W, let Ty denote the projection of V onto U along 
W, and let Tw denote the projection of V onto W along U. Prove the following. 
(1) For any x € V, x = Ty (x) + Tw (x). 
(2) Ty o (Idy — Ty) = 0. 
(3) Ty o Tw = Tw o Ty = 0. 
(4) For any projection T : V > V, Im(Idy — T) = Ker(T) and Ker(Idy — T) = 
Im(T). 


3.4. Matrices of linear transformations 113 


3.4 Matrices of linear transformations 


We have seen that the product of an m x n matrix A and an n x 1 column 
matrix x gives rise to a linear transformation from R” to R”. In this section, 
we will show that a linear transformation of a vector space into another can 
be represented by a matrix. 

Let T : V — W bea linear transformation from an n-dimensional vector 
space V to an m-dimensional vector space W, and let a = {vj, ..., vn} 
and 8 = {wi, ..., Wm} be any bases for V and W respectively, which will 
be fixed throughout this section. 

Then by Theorem 3.3 the linear transformation T is completely deter- 
mined by its values on a basis a: Write them as, 


T(vi) = anwı + anaw? + +++ + GmiWm 
T (v2) = G@12W, T 422W2 T +++ T Um2Wm 
T (vn) = nW © G2nW2 T t T amnWm; 


or, in a short form, 
T(vj) = X aijwi fri <j <n, 
i=1 


for some scalars ajj (i = 1, 2, ..., m; j = 1, 2, ..., n). 
Now for any vector x = ial ziv EV, 


nm nm m m n 
T(x) = D = Se > aijWi = bD ye aij£tj | wi. 
j=1 j=l i=l i=1 \ j=l 
Therefore, the coordinate vector of T(x) with respect to the basis 6 in W is 
Xj- Ajit; Qil c++ Gin Tı 
[T(x)]s = : =| : > | = Alxla. 
ae amyl ami °°" Amn Tn 


That is, for any x € V the coordinate vector [T(x)]g of T(x) in W is 
just the product of a fixed matrix A and the coordinate vector [x]q of x. 
The situation can be incorporated in the commutative diagram as shown in 
Figure 3.4 with the natural isomorphisms ® and W, defined in Section 3.2. 
By the commutativity of the diagram, we mean that Ao®= WoT, 


114 Chapter 3. Linear Transformations 


IR” 


Figure 3.4: The associated matrix for T 


Note that 


A=| : > | = (Tile --- Tvn)lal 


amli *** Amn 


is the matriz whose j-th column vector is just the coordinate vector [T (v;)]g 
of T(v;) with respect to the basis p. In fact, A = [a;j] is just the transpose 
of the coefficient matrix in the expression of T (v;) with respect to the basis 
6 in W. Note that this matrix A is unique since the coordinate expression 
of a vector with respect to a fixed basis is unique. 


Definition 3.5 The matrix A is called the associated matrix of T (or 
the matrix representation of T) with respect to the bases a and 8, and 
denoted by A = [T]2. When V = W and a = £, we use [Tq for [T]®. 


Q 


The discussion on the matrix representation of a linear transformation 
so far can be summarized in the following theorem. 


Theorem 3.11 Let T : V — W be a linear transformation from an n- 
dimensional vector space V to an m-dimensional vector space W. For fixed 
bases @ = {v1, V2, ..., Vn} for V and B for W, there corresponds a unique 
m xn matrix TÉ associated with T such that for any vector x € V the 
coordinate vector [T(x)]g of T(x) with respect to B is given as a matrix 
product of the associated matrix TË for T and the coordinate vector |x]a, 
i.e., 


IT(=)]s = [TA]. 


The associated matriz ITE of T is given as 


Tle = [T (v)]s [T(və)]s = (T(vn)]a]- 


3.4. Matrices of linear transformations 115 


The following examples illustrate the computation of the associated ma- 
trices for linear transformations. 


Example 3.11 Let Id: V — V be the identity transformation on a vector 
space V. Then for any ordered basis a for V, the matrix [Id], = I, the 


identity matrix, because if a = {v1, ..., vn}, then 
Id(vi) = lvi + Ove + +++ + Ovm 
Id(v2) = Ov; + 1v2 + +++ + Ovn 
Id(vn) = Ovi + Ovg + © + Ivm. 


Example 3.12 Let T : P,(R) —> P,(R) be the linear transformation defined 
by 


Find the associated matrix (Tig with respected to ordered bases a = {1, x} 


and 6 = {1, x, x°} for Pi(R) and P,(R), respectively. 


Solution: Clearly, 


lx Ox? 
Ox 1x?. 


m 
=. 
— 
S 
| 
II 


0-1 
0-1 


(T(z) (2) = 2? 


Hence, the associated matrix for T is the transpose of the coefficient matrix 


=. o o 


0 
in this expression, that is, vale “IR. 
0 


Example 3.13 Let T : R — R be the linear transformation defined by 
T(x, y) = (x+ 2y, 0, 2% + 3y) with respect to the standard bases œ and 8 
for R? and RÌ, respectively. Then 


T(e) = T(1, 0) = (1,0,2) = ley + Oeg + 2e 
2e, T 0e2 T 3e3. 


8 
“~~ 
o>) 

N 
Saga 
| 
= 
“=~ 
© 
m 
a 
| 
—~ 
N 
© 
w 
a” 
| 


1 2 
Hence, [T] = | 0 0 |. If 6’ ={e3, e2, e1}, then [TE =| 0 0 
2 3 


116 Chapter 3. Linear Transformations 


Example 3.14 Let T : R — R be a linear transformation given by 
T(1, 1) = (0, 1) and T(—1, 1) = (2, 3). Find the matrix representation 
[T]a of T with respect to the standard basis a = {e1, e2}. 


Solution: Note that (a, b) = ae; + bez for any (a, b) € R. Thus the 
definition of T shows 


T(eı)+T(e2) = Tler+e.) = TH, 1) = (0, 1)= e2, 
—T (e1) + T(e2) = T(-e, + e2) = T(-1, 1) = (2, 3) = 2e1 + 36e2. 


By solving these equations, we obtain 


T(e) = —e; — œp, 
T (e2) = e€ + 2ez. 


-1 2 


Therefore, [T]q = | Eo | é 


Example 3.15 Let T be the linear transformation in Example 3.14. Find 
[T]g for a basis 6 = {v1, v2}, where vı = (0, 1) and v2 = (2, 3). 


Solution: From Example 3.14, 


wre | sls || elk 
ros = [E [e] E -re 


Writing these vectors as linear combinations of basis vectors in 8, we put 


E ones 2b E E PRE 2d 
2 = avi V2 = a+ 3b 5 4 = CV1 V2 = c+ 3d k 


Solving for a, b, c and d, we obtain 


rwle=($]=5| 7] ama eoole=| 5 ]=5/ 7] 


Therefore, [T]g = 4 | oS | ; 


3.4. Matrices of linear transformations 117 


Example 3.16 Let T bea projection onto a subspace U of V along another 
subspace W. Let {v1,..., Vk} and {vyz41,..-,Vn} be bases for U and W 
respectively. Clearly, they form a basis a for V, and the matrix of T is 


ea Tyee 


Remark: (1) Recall that any m x n matrix A can be considered as a linear 
transformation from the n-space R” to the m-space R” via x +> Ax. Clearly, 
its matrix representation with respect to the standard bases a for R” and 6 
for R” is the matrix A itself, i.e., A = [Ale (Note that Ae; is just the j-th 
column vector of A.) In particular, if A is an invertible n x n square matrix, 
then the column vectors c1, C2,... , Cn form another basis y for R”. Thus, 
A is simply the linear transformation on R” that takes the standard basis @ 
to y, in fact, 


Qij 
Aej = : = Cj, 
anj 
the j-th column of A, so that its matrix representation [A]% is the identity 
matrix. 

(2) Let V and W be vector spaces with bases a and (, respectively, and 
let T : V — W be a linear transformation with the matrix representation 
(rig = A. Then it is clear that Ker(T) and Im(T) are isomorphic to the 
null space N (A) and the column C(A), respectively, via the natural isomor- 
phisms. In particular, if V = R” and W = R” with the standard bases, 
then Ker(T) = N(A), and Im(T) = C(A). Therefore, from Corollary 2.17, 
we have 

dim(Ker(T)) + dim(Im(T)) = dim V. 


(3) Let Ax = b be a system of linear equations for an m x n matrix A. 
By considering the coefficient matrix A as a linear transformation from R” 
to R”, one can have more equivalent conditions than those in Theorem 2.24 
and 2.25: The condition C(A) = R” in Theorem 2.24 is equivalent to the 
condition that A is surjective, and the condition NV(A) = {0} in Theorem 
2.25 is equivalent to the condition that A is one-to-one. This observation 
gives the proof of the following Theorem. 


Theorem 3.12 For a square matriz A of order n, the following statements 
are equivalent. 


(1) A is invertible. 


118 Chapter 3. Linear Transformations 


(2) The linear transformation A: R° — R" via A(x) = Ax is injective. 


(3) The linear transformation A: IR" — R" is surjective. 


Problem 3.11 Find the matrix representations |[T]a and [T']g of each of the fol- 
lowing linear transformations T on R with respect to the standard basis a = 


{e1, €2, e3} and B = {es, €2, e}: 
(1) T(z, y, z) = (2x — 3y + 4z, 5a — y + 2z, 4x + Ty), 
(2) T(z, y, z) = (2y + z, x — 4y, 3x). 


Problem 3.12 Let T : R* — R be the linear transformation defined by 


T(z, y, z, u) = (x + 2y, x — 3z +u, 2y + 3z + 4u). 
Let a and 8 be the standard bases for R* and RË, respectively. Find [T]8. 


Problem 3.13 Let Jd: R” — R” be the identity transformation. Let x, denote 
the vector in R” whose first k — 1 coordinates are zero and the last n — k +1 
coordinates are 1. Then clearly 6 = {x1, ..., Xn} is a basis for R” (see Prob- 
lem 2.11). Let a = {e1, ..., en} be the standard basis for R”. Find the matrix 
representations [Id]? and [Id]g. 


3.5 Vector spaces of linear transformations 


Let V and W be two vector spaces of dimensions n and m. Let L(V; W) 
denote the set of all linear transformations from V to W, i.e., 


L(V;W) ={T : T is a linear transformation from V to W}. 


Definition 3.6 For any two linear transformations S and T in L(V; W) 
and A € R, define the sum S +T and the scalar multiplication AS by 


(S+T)(v) =S(v)+T(v) and (AS)(v) = A(S(v)) 
for any vE V. 


Clearly, the sum S +T and the scalar multiplication AS are also linear, 
so that L(V; W) becomes a vector space. 

Let a and £ be two bases for V and W, respectively, and let T : V > W 
be a linear transformation. Then the associated matrix TÈ of T with 
respect to these bases is uniquely determined by Theorem 3.11. That is, the 
transformation ¢: L(V; W) > Mmxn(R) given by 


$T) = [T]& € Mmxn(R) 
for any T € L(V; W) is well-defined (see Section 3.4). 


3.5. Vector spaces of linear transformations 119 


Lemma 3.13 The transformation ¢: L(V;W) > Mmxn(R) is a one-to- 
one correspondence between L(V; W) and Mmxn(R). 


Proof: (1) It is one-to-one: If [SË = (712 for S and T in L(V; W), then 
we have S = T by Corollary 3.4. 

(2) It is onto: For any m x n matrix A (considered as a linear trans- 
formation from R” to R”), define a linear transformation T : V > W by 
T = Y7} o A o Ẹ as the composition of A with the natural isomorphisms ® 
and W. Then clearly |T] = A, i.e., ¢ is onto. 


Furthermore, the following theorem shows that ¢ is linear, so that it is 
in fact an isomorphism from L(V; W) to Mmxn(R). 


Theorem 3.14 Let V and W be vector spaces with ordered bases a and ß, 
respectively, and let S, T: V — W be linear. Then we have 


[S +T] = [S] + [T] and [kS] = kiS]f. 


Q 


Proof: Let a = {vi, ..., Vn} and 6 = {w1, ..., Wm}. Then we have 
unique expressions S(vj) = J; aijwi and T(v;) = J; bijw; for each 
1 <j <n, so that [S]É = [aij] and [T] = [bi]. Hence 
m m m 
(S+T)(vj) = J aijwi+ $ bijw = ) (ay + big) wi. 
i=1 i=1 i=1 
Thus 
[S +T] = [S18 + (Te. 


The proof of the second equality kS] = kis is left as an exercise. 


In particular, if V = R” and W = R” , then the vector space Mmxn(R) of 
mxn matrices may be identified with the vector space L(R" ; R” ), since such 
a matrix A is a linear transformation and A itself is the matrix representation 
of itself with respect to the standard bases of R” and R”. 

Thus one can summarize this results in the following corollary: 


Corollary 3.15 Let V and W be two vector spaces of dimension n and m 
with fixed bases a and B, respectively. Then the vector space L(V; W) of all 
linear transformations from V to W and the vector space Mmxn(R) of all 
m x n matrices are isomorphic so that 


dim L(V; W) = dim Mmxn (R) = mn = dim V dim W. 


120 Chapter 3. Linear Transformations 


Remark: For a linear transformation T : V > W of the same dimensional 
vector spaces V and W, the following conditions are equivalent by applying 
Theorem 2.26 to [T8: 


(i) T is an isomorphism, 

(ii) T is one-to-one, 

(iii) T is surjective. 
(One can also prove them directly by using the definition of a basis for V. 
See Problem 3.7.) 


The next theorem shows that the one-to-one correspondence between 
L(V;W) and Mmxn(R) preserves not only the vector space structure but 
also the compositions of linear transformations. Let V, W and Z be vector 
spaces. Suppose that S : V > W and T : W —> Z are linear transformations. 
Then clearly the composition T o S : V —> Z is also linear. 


Theorem 3.16 Let V, W and Z be vector spaces with bases a, 6, and 
y, respectively. Suppose that S : V > W andT : W —> Z are linear 
transformations. Then 

[T o 51% = [TIS]. 


Proof: Let a = {vi, ..., Vn}, 8 = {W1, ---, Wm} and y = {z, ..., Ze}. 


Let [T]} = [aij] and [S]& = [bpa]. Then, fr 1<i<n 


(ToS)(vi) = T(S(v:)) =T z baw) = So bei T (we) 


k=1 k=1 
m £ £ m 
= Y bk S ORZ = D (>: anbu) Zj 
k=1 j=1 j=1 \k=1 


Problem 3.14 Let a be the standard basis for R?, and let S, T : R? > R? be two 
linear transformations given by 


S(e1) = (2, 2, 1), S (e2) = (0, 1, 2), S (e3) = (=l 2, 1), 
T(eı)= (1, 0, 1), T(e2)= (0, 1, 1), T(e3)= (1, 1, 2). 
Compute [S +T]a, [2T — S]a and [|T o S]a. 


3.6. Change of bases 121 


Problem 3.15 Let T : P2(IR) > P2(R) be the linear transformation defined by 
T(f) =(3+2)f'+2f, and S : P)(IR) — R? defined by S(a+bz+ cx) = (a—b, a+ 
b, c). For a basis a = {1, z, x°} for P2(R) and the standard basis 8 = {e1, e2, e3} 
for Rè, compute [S]8, [T]a, and [So T]8. 


Qa? 


Theorem 3.17 Let V and W be vector spaces with bases a and B, respec- 
tively, and let T : V + W be an isomorphism. Then (Te is invertible, 
and 


Proof: Since T is invertible, dim V = dim W, and the matrices TË and 
[T~*]g are square and of the same size. Thus, 


[TR g = (Lot "|g = [Id]g 


a 


is the identity matrix. Hence, [T~!]¢ = (TJ). 


In particular, for the identity transformation Id: V > V, if œ and 8 are 
two bases for V, then (dj)! = [Id]. 


Problem 3.16 For the vector spaces Pı (R) and R°, choose the bases a = {1, x} 
for P,(IR) and 6 = {e1, e2} for R’, respectively. Let T : P, (IR) > R be the linear 
transformation defined by T (a + bx) = (a, a+ b). 

(1) Show that T is invertible. (2) Find [T] and ee 


3.6 Change of bases 


In Section 3.2, we have seen that, in a vector space V with a fixed basis a, 
any vector x can be identified with a column vector [x]q in the n-space R” 
via the natural isomorphism ®. Of course, one may get a different column 
vector [x] if another basis £ is given instead of a. Thus, one may naturally 
ask what the relation between [x], and [x]g is for the two different bases. 
To answer this problem, let us begin with an example in the plane R’. 
The coordinate expression of x = (x, y) € R? with respect to the standard 
basis a = {e1, e2} is x = re, + yez, so that [x]q = K 
Now let 6 = {v1, vo} be another basis for R? obtained by rotating a 
counterclockwise through an angle @ as in Figure 3.5, and suppose that the 


122 Chapter 3. Linear Transformations 


coordinate expression of x € R? with respect to 8 is written as x = uv +vv2, 


or fxle = | * |. 


Then the expressions of the vectors in 6 with respect to a are 


D = Id(v1) 
V2 = Id(v2) 


| sin 0 ev 
+t cos@eg, 


| 
or) 
ie) 
na 
D 
© 
= 

i 


| 

| 
Nn 
= 
B 
D 
© 
= 

| 


SO 


Figure 3.5: Coordinates {e,, e2} and {v1, v2} 
Therefore, from x = xe, + yez and also 
x = uvı +vv2 = (ucosé — vsinO)e; + (usin + v cos 0)e2, 


one can have the matrix equation: 


-ias iso e| o ila = magts, 


where 


Hdjg = [[vi]a [vale] = | sinô  cos0 


cos? —sin@ | 

It means that two coordinate vectors [x]q and [x]g in the 2-space R? are 
related by the associated matrix [I dl for the identity transformation Id on 
cos@ sin 


2 f a)y-1 _ 
R°. Note that [Id]q = ([Id]§)~° = | zea 


| by Theorem 3.17. 


3.6. Change of bases 123 


In general, if a = {v1, vo, ..., Vn} and 6 = {wi, W2, ..., Wn} are 
two bases for an n-dimensional vector space V, then any vector x € V has 
two expressions: ñ i 

x = J nvi=) ywi 
i=1 j=1 
In particular, each vector in 8 is expressed as a linear combination of the 
vectors in a: say wj = Id(w;) = 7}, qijVi for j =1, 2, ..., n, so that 
dij 
[lwil = Hd(w;)a= |: 
anj 
Then for any xE V, 
n n n n n n 
E So uN = So yiw) = Sou X givi = ` qijYj | Vi- 
i=1 j=1 j=l i=l i=1 \j=1 
This is equivalent to the matrix equation [x]q = [Id]3[x]g, or 
T1 M11 `t Gn yı 
= b] 
Tn ni `t dnn Yn 
where 
q11 din 
Hd] = =[ [wile +++ [Wala] 
dni ann 
Id 
V aaua V 
x ~ X 
p { j! ® 
kj —~ [xo 
IR” M R”, 
Q = [Zd] 


Figure 3.6: Generating the coordinate change matrix 


This means that any two coordinate vectors of a vector in V with respect 
to two different bases a and £ are related by the matrix representation [I d] 


124 Chapter 3. Linear Transformations 


of the identity transformation on V, and this can be incorporated in the 
commutative diagram as shown in Figure 3.6. 


Definition 3.7 The matrix representation [Id]ġ of the identity transfor- 
mation Id : V — V with respect to any two bases &œ and £ is called the 
transition matrix or the coordinate change matrix from £ to a. 


Since the identity transformation Id: V — V is invertible, the transition 
matrix Q = |d] is also invertible by Theorem 3.17. If we had taken 
the expressions of the vectors in the basis œ with respect to the basis £: 
vj = Id(vj) = Y; piw; for j = 1, 2, ..., n, then [pi] = Id = Q7 
and 

xls = Hdi] = QE]: 


Example 3.17 In the plane RÊ, for the standard basis a = {e1, e2}, the 


coordinate vector of x is [x]a = ; . If we take another basis 8 = {v1, v2} 


—1 ; 
| , then the coordinate 


d 1 
with vı = e;+e2 = 1 


1 | and və = —e; +€ = | 


1 
vector of x with respect to 8 is [x]g = z'vı + y'v2 = | f Hence we have 


a = Jeepi Jeja] 10 


Figure 3.7: Hyperbola in R? 


3.6. Change of bases 125 


Thus, the hyperbola zy = 1 in the plane R? written in the usual coordi- 
nates a can also be written as 1 = zy = (2' —y')(a4@ + y) = x° — y” in the 
new coordinates 8. 


Example 3.18 Let the 3-space R? be equipped with the standard zyz- 
coordinate system, i.e., with the standard basis a = {e1, e2, e3}. Take 
a new x'y'z'-coordinate system by rotating the xyz-system around its z- 
axis counterclockwise through an angle 0, i.e., we take a new basis 8 = 
{ej}, e5, es} by rotating the basis a about z axis through 0. Then we get 


cos 0 — sind 0 
[feila = | site? |,  fesļla = cos |, [esla= | 0 
0 0 1 


Hence, the transition matrix from £ to a is 


cos@ —sin0 A 
Q = [Id] = | sin  cosé 0 


0 0 1 | 
so 
T cos —sind 0 gz! 
[xk]a= | y | =| sinf cosé 0 y | = Q[xle. 
Zz 0 0 1 z' 


Moreover, Q = [I d]3 is invertible and the transition matrix from a to £ is 


cos@ sind 0 


Q`! = [Id] = | -sin cosé 0 |, 
0 0 1 
so that 
x! | cos@ sin 8 i z | 
y | = | —sin@ cosé 0 y |. 
A | 0 0 1 | z | 


Problem 3.17 Find the transition matrix from a basis a to another basis p for 
the 3-space R? , where 


a= {(1,0, 1), (1, 1,0), (0,1, 1)}, 6={(2,3, 1), (1, 2, 0), (2, 0, 3)}. 


126 Chapter 3. Linear Transformations 


3.7 Similarity 


The coordinate expression of a vector in a vector space V depends on the 
choice of an ordered basis. Hence, the matrix representation of a linear 
transformation is also dependent on the choice of bases. 

Let V and W be two vector spaces of dimensions n and m with two 
bases a and £, respectively, and let T : V — W be a linear transformation. 
In Section 3.4, we discussed how to find the associated matrix (712. If one 
takes different bases a’ and p’ for V and W, respectively, then one may 
get another associated matrix [T]? , of T. In fact, we have two different 


Q 
expressions 


[x]a and [x]w in R° for each x € V, 
[T(x)]g and [Z'(x)]g in R” for T(x) € W. 


ale 

R” R” 

(V, a’) (W, 6") 
idge  Tdy | | Idw | [Taw 
(V, a) T __.(w, 8) 
® Ww 
y rê ~~ 
R” R” 


B' 


a! 


Figure 3.8: Generating similar matrices [T] and [T] 


They are related by the coordinate change matrices as follows: 
lo = Havlik xa, and [P(x)]@ = [dw] [Tœ]. 
On the other hand, by Theorem 3.11, we have 
Ts = (Tabla, and Toj = Te e 
Therefore, we get 


Tele = Tle = UdwlhlT]s = wh (TB oda 


3.7. Similarity 127 


Actually, from Theorem 3.16, this relation can be obtained directly from 
T = Idw oT o Idy as 


[T]; = dw oT o Idy], = [Idw]$ [TÉ Hav]%. 


Note that [7]? and (rye, are m x n matrices, |Idy]2, is an n x n matrix and 
[dw] is an m x m matrix. 

The relation can also be incorporated in the commutative diagrams as 
shown in Figure 3.8. 


Our discussion is summarized in the following theorem. 


Theorem 3.18 Let T : V — W be a linear transformation from a vector 
space V with bases a and a’ to another vector space W with bases B and f’. 
Then 


me PEO, 


where Q = [Idy], and P = [Law], are the coordinate change matrices. 


In particular, if we take W = V, a = @ and a’ = p', then P = Q and 
we get to the following corollary. 


Corollary 3.19 Let T : V — V be a linear transformation on a vector 
space V, and let a and B be ordered bases for V. Let Q = [a] be the 
transition matriz from P to a. Then 

(1) Q is invertible, and Q7! = Mdf. 

(2) For any x E€ V, [x]a = Q[x]g- 


(3) T] = Q [T]aQ. 


Relation (3) of [Z’]g and [T]a in Corollary 3.19 is called a similarity. In 
general, we have the following definition. 


Definition 3.8 For any square matrices A and B, A is said to be similar 
to B if there exists a nonsingular matrix Q such that B = Q-'AQ. 


Note that if A is similar to B, then B is also similar to A. Thus we 
simply say that A and B are similar. We saw in Theorem 3.19 that if A and 
B are n x n matrices representing the same linear transformation T, then 
A and B are similar. 


128 Chapter 3. Linear Transformations 


Example 3.19 Let 8 = {v1, v2, v3} be a basis for the 3-space R? consist- 
ing of vı = (1, 1, 0), v2 = (1, 0, 1) and v3 = (0, 1, 1). Let T be the linear 
transformation on R? given by the matrix 


2 1 -!1 
musl Ae 3 
-1 1 1 


Let a = {e;, €2, e3} be the standard basis. Find the coordinate change 
matrix [Id]? and [T]q- 


Solution: Since vj = e; + €2, V2 = €] + €3, V3 = €2 + €3, we have 


1 1 0 1 = 
[Idg=|1 01], and [dà = (Hd) = 5 1 -1 
0 1 1 — 
Therefore, 
1 4 2 a 
[Tle = Hdg] ==] 3 -1 1). 
211 17] 


Example 3.20 Let T : R? — R? be the linear transformation defined by 
T(z, T2, x3) = (221 + £2, T1 + T2 + 373, —22). 


Let a = {e,, €2, e3} be the standard ordered basis for R?, and let 6 = 
{vi, V2, v3} be another ordered basis consisting of vı = (—1, 0, 0), v2 = 
(2, 1, 0), and v3 = (1, 1, 1). Find the associated matrices [T]a and [T] for 
T. Also, show that T(v;) is the linear combination of the basis vectors in 8 
with the entries of the j-th column of [T]g as its coefficients for j = 1,2, 3. 


Solution: One can easily show that 


2 1 0 -1 2 1 
Peet 1 3 and Q = [Id]; = 0 1 1 
0 —1 0 0 0 1 
-1 2 -l 
Thus, with the inverse Q~! = 0 1 —1 |, one shows that 
0 0 1 
0 2 8 
[T]s=Q‘Tl.Q=|-1 4 


3.7. Similarity 129 


To show the second statement, let 7 = 2. Then T(v2) = T(2,1,0) = 
(5,3, —1). On the other hand, the coefficients of [T (v2)]g are just the entries 
of the second column of [T']g. Therefore, 


T (v2) = 2v1 + 4və — V3 
2(—1,0, 0) + 4(2, 1,0) E (1, 1, 1) = (5, 3, =), 


as expected. 


The next theorem shows that two similar matrices can be matrix repre- 
sentations of the same linear transformation. 


Theorem 3.20 Suppose that ann x n matrix A represents a linear trans- 
formation T : V —> V on a vector space V with respect to a basis a = 
{vi, vo, -.-; Vn}, ie, [Tla = A. If B = Q-'AQ for some nonsingu- 
lar matrix Q, then there exists a basis B for V such that B = [Tg and 
Q = [Id]. 


Proof: Let Q = [qij] and let w1, w2, ..., Wn be the vectors in V defined 
by 

wi = Qvi + G21V2 +++ + qniVn 

we = gqievi + Q22V2 + `° + Gn2Vn 

Wn = GdnVi + danV2 + +++ + dnnVn- 
Then the nonsingularity of Q = [qi] implies that 6 = {w1, W2, ..., Wn} is 


an ordered basis for V, and Theorem 3.19 (3) shows that [T]; = Q7'[T]aQ = 
Q'AQ = B with Q = [Id]g. 


Example 3.21 Let D be the differential operator on the vector space P2(R). 
For the basis a = {1, x, x°}, note first that 


DO) = 0 =0-14+0-2+0-2? 
D(z) = 1 =1-1+0-2+0-2? 
D(a?) = 22 =0-14+2-244+0-27. 


Hence, the matrix representation of D with respect to @ is given by 
0 1 0 

Dig= | 0-0-2 

0 0 0 


130 Chapter 3. Linear Transformations 


Choose a nonsingular matrix 


1 0 -2 UEF 
Q=|02 0l, with its inverse Q'=-|0 2 0 
00 4 AE EE 
Let 0 2 o] 
B=Q'[D]haQ=|0 0 4 
0 0 o | 


Now, we are going to find a basis 6 = {v1, v2, v3} so that B = [D],. But, 
if it is, the matrix Q must be the transition matrix [Td]5, and then 


vı = 1-140-24+0-2? = 1 
vo = 0-1+2-¢+0-2% = 2g 
vg = —2-14+0-2+4-2 —2 + 4x7. 
Clearly, one can obtain 
D(A) = 0 = 0-1+0-2z+0-(—2+ 427) 
D(2z) = 2 = 2.1+0-2g+0.-(—2+4r°) 
D(—2 + 4z?) = 8r = 0-1+4-27s+0-(—2+ 4z), 
and 
02 0 
[Dlg=|0 0 4], 
0 0 0 


as expected that [D] = B = Q 7 "[D]aQ. 


Problem 3.18 Let T : R? — R? be the linear transformation defined by 

T(a1, 2, £3) = (£1 + 2£2 + £3, —£2, Tı + 423). 
Let a be the standard basis, and let 6 = {v1, v2, v3} be another basis consisting 
of vı = (1, 0, 0), v2 = (1, 1, 0), and v3 = (1, 1, 1) for RÌ. Find the associated 
matrix of T with respect to a and the associated matrix of T with respect to p. 
Are they similar? 


Problem 3.19 Suppose that A and B are similar n x n matrices. Show that 
(1) det A = det B, 
(2) tr(A) = tr(B), 
(3) rank A = rank B. 


Problem 3.20 Let A and B be n x n matrices. Show that if A is similar to B, 
then A? is similar to B?. 


3.8. Dual spaces 131 


3.8 Dual spaces 


Note that the space of all scalars is a one-dimensional vector space R, and 
the set of all linear transformations from V to R is the vector space L(V; R) 
whose dimension is equal to the dimension of V (see page 119), so that the 
two vector spaces L(V; R) and V are isomorphic. In this section, we are 
concerned exclusively with such linear transformations from V to the scalar 
space R. 


Definition 3.9 Let V be a vector space. 


(1) The vector space £(V;R) of all linear transformations from V to R is 
called the dual space of V and denoted by V* = L(V; R). 


(2) An element (i.e., a linear transformation) in the vector space L(V; R) 
is called a linear functional of V. 


From the definition, one can say that any vector space is isomorphic to 
its dual space. 


Example 3.22 Let C[a,b] be the vector space of all continuous real-valued 
functions on the interval [a,b]. The definite integral Z : Cla, b] + R defined 
by 


T 
UPE) = | Foa 
a 
is a linear functional of C[a, b]. In particular, if the interval is [0,27] and n 
is an integer, then 
1 20 


An(f) = ap f(t)cosnt dt and B,(f)= ~ Pde sin nt dt 


are linear functionals, called the n-th Fourier coefficients of f. 


Example 3.23 The trace function tr : Mnxn(R) > R is a linear functional 
of Mnxn(R), so that tr E Mnxn(R)*. 


For a matrix A regarded as a linear transformation A : R” — R”, the 
transpose A’ of A is another linear transformation A’ : R” > R”. In 
general, for a linear transformation T : V — W from a vector space V 
to W, one can naturally ask what the meaning of the transpose is. In the 
following, we try to answer this question. 


132 Chapter 3. Linear Transformations 


Recall that a linear functional T : V — R is completely determined by 


the values on a basis for V. Let a = {v1, V2, .--, Vn} be a basis for V. 
For each i = 1, 2, ..., n, define a linear functional 

v;:V>R 
by vi(v,;) = 6;; for each j = 1, 2, ..., n. Then, for any x = }_ a;v; E€ V, we 


have v*(x) = a;, which is the i-th coordinate of x with respect to a. Thus, 
the functional v; is called the 7-th coordinate function with respect to 
the basis a. 


Theorem 3.21 The set a* = {vj, v5, ..., vi} of coordinate functions 
forms a basis for the dual space V*, and for any T € V* we have 


n 
T=) Tive 
i=1 


Proof: The set a* = {vj, v5, ..-, vz} is linearly independent, since 
0 = 0, civ} implies 0 = $; avi(v;) = cj for each j = 1, ..., n. Since 
dim V* = dimV = n, these n linearly independent vectors in a* must span 
V*, so that a@* forms a basis. Moreover, for any T € V*, let T = X= cjvj. 
Then T(vj) = X; civ} (vj) = cj. It gives T = } 0} T(vy)vj- 


Definition 3.10 For a basis a = {v1, vo, ..., Vn} for a vector space V, 
the basis a* = {vj}, v5, ..., vš} for V* is called the dual basis of a. 


Theorem 3.21 says that, for a vector space V with a fixed basis a = 
{V1, ---, Vn}, one can define a transformation * : V => V* by *(vi) = vj, 
which becomes an isomorphism between V and V*. Therefore, we have the 
following corollary. 


Corollary 3.22 Any finite-dimensional vector space is isomorphic to its 
dual space. 


Example 3.24 Let a = {v1, V2}, where vı = (1, 2) and və = (1, 3), be 
a basis for R’. To find its dual basis a* = {vj}, v3} of a, we consider the 
equations 


(e1) + 2vj (e2) 
(e1) + 3vj(e2). 


Solving these equations, we obtain that vj(e1) = 3 and vj(e2) = —1. Thus 
vi(z,y) = 3x — y. Similarly, it can be shown that v3(z,y) = —2z + y. 


= vi(vı) = 


0 = viv) = 


3.8. Dual spaces 133 


Example 3.25 Consider V = R” with the standard basis œ = {e1, e2, ..., 
en}, and its dual basis a* = {ef, e3, ..., ež} for R’*. Then for a vector 
a= (a1, a2, ..., Gn) = ae; + ageg + +++ + anen E R”, we have e*(a) = 
ež (a1€1 + a2€2 + +++ + anen) = a; for all i. 

On the other hand, when we write a vector in R” as x = (£1, £2, ..., Zn) 
in coordinate functions (or unknowns) z;, it means that given a point a = 
(a1, a2, .--, Gn) E R”, each z; gives us the i-th coordinate of a, that is, 
xila) = a; for alli. In this sense, one can identify ef = x; fori = 1, 2, ..., n, 
so that R”* = R” and they are called coordinate functions. 


Problem 3.21 Let a = {(1, 0, 1), (1, 2, 1), (0, 0, 1)} be a basis for RÌ. Find 
the dual basis a*. 


Now, consider two vector spaces V and W with fixed bases a and £, 
respectively. Let T : V — W bea linear transformation from V to W. Then 
there is a transformation T* : W* + V* defined naturally by T*(g) = goT 
for any g E€ W*. In fact, for any linear functional g € W*, i.e., g: W —> R, 
the composition goT : V > R given by goT(x) = g(T(x)) for x € V defines 
a linear functional on V, i.e., T*(g) =goT € V*. 


Lemma 3.23 The transformation T* : W* — V* defined by T*(g) = 
goT forg E€ W* is a linear transformation. It is called the adjoint (or 
transpose) of T. 


Proof: For any f,g € W*, a,b€ Randx eV, 


T*(af + bg)(x) = af(T(x)) + bg(T(x)) = (aT*(f) + I" (g))(x)- 


Example 3.26 Let Id: V — V be the identity transformation on a vector 
space V, and let a = {vj, V2, -.., Vn} bea basis for V. Then Id*(v;)(v;) = 
v;(Id(v;)) = v3 (vj) for all j =1, 2, ..., n implies the adjoint Id* : V* > 
V* is also the identity transformation on V*, i.e., Id* = Id. 


Let S: U — V and T : V — W be two linear transformations. Then for 
any g E W* and x EV, 


(T o S)"(g)(x) 


go (To S)(x) = g(T(S(x)) 
T*(9)(S(x)) = S*(T*(9)) (x) = S* 0 T*(9)(x) 


shows that 
(T o S)* = S* o T*. 


134 Chapter 3. Linear Transformations 


Hence, if S : V — W is an isomorphism so that So $~! = Id, then (S~!)* o 
S* = Id* = Id shows that S* : W* > V* is also an isomorphism. 

Note that the linear transformation x» : V — V* defined by assigning 
a basis for V to its dual basis is an isomorphism, so that the composition 
xx: V + V*™ is also an isomorphism. However, an isomorphism between 
V and V™ can be defined without choosing a basis for V. In fact, for each 
x € V, one can first define x : V* > R by x(f) = f(x) for every f € V*. 
It is easy to verify that x is a linear functional on V* so that x € V™. 
The following theorem shows that the mapping ® : V > V** defined by 
(x) = x is an isomorphism which is not dependent on the choice of a basis 
for V, called a canonical isomorphism. 


Theorem 3.24 The mapping ® : V —> V™ defined by ®(x) = X is an 
isomorphism from V to V*™. 


Proof: ‘To show the linearity of ®, let x, y € V and k a scalar. Then, for 
any f € V*, 


®(x+ky)(f) = (x+hky)(f) = f(x + ky) 
= f(x) +kf(y) =(f) + k¥(f) 
(% + k¥)(f) = (®(x) + k®(y)) (F) 


Hence, P(x + ky) = ®(x) + k®(y). 

To show that ©® is injective, suppose x € Ker(®). Then ®(x) =x = 0 in 
V**, ie., X(f) = 0 for all f € V*. It implies that x = 0: In fact, if x 4 0, 
one can choose a basis œ = {vj, V2, -.-, Vn} for V such that vı = x. Let 
a* = {vi, v5, .--, vz} be the dual basis of a. Then 


0 = x(vj) = vi(x) = vi(vi) = 1, 


which is a contradiction. Thus, x = 0 and Ker(®) = {0}. 
Since dim V = dim V**, ® is an isomorphism. 


Problem 3.22 Let V = R? and define f; € V* as follows: 


filz, Y, z) =x — 2y, fo(z, Y, z)=atytz, f(z, Y, z) =y- 3z. 
Prove that {f1, fo, fs} is a basis for V*, and then find a basis for V for which it 
is the dual. 


We now consider the matrix representation of the transpose S* :W* > 
V* of a linear transformation S : V > W. Let a = {vi, vo, ---; Vn} 
and 8 = {wi, W2, ..., Wm} be bases for V and W with their dual bases 
a* = {vi, vš, ..., ve} and 6* = {wi, ws, ..., w;,}, respectively. 


3.8. Dual spaces 135 


Theorem 3.25 The matrix representation of the transpose S* : W* + V* 
is the transpose of the matrix representation of S: V + W, that is, 


? T 
ES 
Proof: Let S(v;) = J p] akiWk, So that 


[S12 = [S(wi)]o «++ [S(vn)]a] = lai]. 


Then 
[S]a = [S (wia +++ [S" (win lar] - 
Note that, for 1 < j < m, 


S*(w5) = J S*(w3)(wa)vi = Do avi, 
i=1 ak 
S*(wi)(vi) = (w7oS)(vi) = wj(S(vi)) 
= Wj (Em) = X apiw} (wa) = dji 
k=1 k=1 


T 
Hence, we get [9*]3: = ((s12) . 


Example 3.27 Let A : R” — R” be a linear transformation defined by 
an m x n matrix A. Let a and 8 be the standard bases for R” and R”, 
respectively. Then [A]? = A. By Theorem 3.25, we have [A*]¢: = ({AlB)?. 
Thus, with the identification R** = R¥ via a* = a and 6* = BG as in 
Example 3.25, we have [A*]3- = A* and A* = A’. In this sense, we see that 
the transpose A’ is the adjoint transformation of A. 


As the final part of the section, we consider the dual space of a subspace. 
Let V be a vector space of dimension n, and let U be a subspace of V of 
dimension k. Then U* ={T:U—R : T is linear on U} is not a subspace 
of V*. However, one can extend each T € U* to a linear functional on 
V as follows. Choose a basis a = {uj, Ug, ..., ug} for U. Then clearly 
a* = {uj}, už, ..., uj} is its dual basis for U*. Now extend a to a basis 


136 Chapter 3. Linear Transformations 


B ={u1, uo, ..., Uk, Uk+1;---, Un} for V. For each T € U*, let T:V OR 
be the linear functional on V defined by 


= _ fT) ifi<k, 
meda ifk+1<i<n. 


Then clearly T € V* and the restriction T|y of T on U is simply T: ie., 
T|u = T € U*. It is easy to see that (T+kS) = T+kS. In particular, 
it is also easy to see that {uy, u5, ..., ux} is linearly independent in V* 
and už|y = uj € U*, i = 1, 2, ..., k. Therefore, one obtains a one-to-one 
linear transformation 


p:U* => V* 


given by (T) = T for all T € U*. The image y(U*) is now a subspace of 
V*. By identifying U* with the image y(U*), one can say U* is a subspace 
of V*. 


Problem 3.23 Let U and W be subspaces of a vector space V. Show that U C W 
if and only if W* C U*. 


Let S be an arbitrary subset of V, and let (S) denote the subspace of V 
spanned by the vectors in S. Let St = {f € V*: f(x) =0 for any x € S$}. 
Then it is easy to show that S+ is a subspace of V*, St = (S)+, and 
dim(S) + dim S+ =n. 

Let R be a subset of V*. Then R- = {x € V : f(x) =0 for any f € R} 
is again a subspace of V such that R+ = (R)+ and dim R+ + dim(R) = n. 


Problem 3.24 For subspaces U and W of a vector space V, show that 
(1) (U+W)+ =Utnwt (2) (UNW)+ =Ut+wt. 


3.9 Applications 


3.9.1 Computer graphics 


One of the simple applications of a linear transformation is to animations 
or graphical display of pictures on a computer screen. For a simple display 
of the idea, let us consider a picture in the 2-plane R?. Note that a pic- 
ture or an image on a screen usually consists of a number of points, lines 
or curves connecting some of them, and information about how to fill the 
regions bounded by the lines and curves. Assuming that the computer has 


3.9. Application: Computer graphics 137 


1 6O O 
Figure 3.9: L.A. 


information about how to connect the points and curves, a figure can be 
defined by a list of points. For example, consider the capital letters ‘LA’ 
below. 

This figure can be represented by a matrix of coordinates of the vertices. 
For the sake of brevity we write it just for ‘L’ as follows: The coordinates 
of the 6 vertices form a matrix: 


vertices 1 2 3 4 5 6 
xz-coordinate | 0.0 0.0 0.5 0.5 20 2.0] _ 


y-coordinate | 0.0 2.0 20 0.5 0.5 0.0 =e 


Of course, we assume that the computer knows which vertices are connected 
to which by lines via some algorithm. We know that line segments are trans- 
formed to another line segments by a matrix, considered as a linear trans- 
formation. Thus, by multiplying a matrix to A, the vertices are transformed 
to another set of vertices, and the line segments connecting the vertices are 
preserved. 

1 0.25 
0 1 
the following form, which represents new coordinates of the vertices: 


For example, the matrix B = | | transforms the matrix A to 


vertices 1 2 3 4 5 6 
BA = 0.0 0.5 1.0 0.625 2.125 2.0 
= 0.0 2.0 20 0.5 0.5 00| 


Now, the computer connects these vertices properly by lines according to 
the given algorithm and displays on the screen the changed figure as the left 
side of the Figure 3.10. The multiplication of the matrix C = z to 
BA shrinks the width of BA by half producing the right side of Figure 3.10. 
Thus, changes in the shape of a figure may be obtained by compositions of 
appropriate linear transformations. Now, it is suggested that the readers 


try to find various matrices such as reflections, rotations, or any other linear 


138 Chapter 3. Linear Transformations 


transformations, and multiply them to A to see how the shape of the figure 


changes. 
2—3 
BA (CB)A 
—>- 4 5 —>- 
1 6 


Figure 3.10: Tilting and Shrinking 


Remark: Incidentally, one can see that the composition of a rotation by 
t followed by a reflection about the x-axis is the same as the composition 
of the reflection followed by the rotation (see Figure 3.11). In general, a 
rotation and a reflection are not commutative, neither are a reflection and 
another reflection. 


Rotation by m 
—>- 


| Reflection Reflection | 


— 
Rotation by m 


Figure 3.11: Commutativity of a rotation and a reflection 


The above argument generally applies to a figure in any dimension. For 
instance, a 3 x 3 matrix may be used to convert a figure in R since each 
point has 3 components. 


Example 3.28 It is easy to see that the matrices 


1 0 0 cos 0 —sinf 
Raa = | 0 cosa -sina |, Ryp) = 0 1 0 ; 
0 sina cosa sing 0  cosf 


3.9. Application: Computer graphics 139 


cosy —siny 0 
Riz) = | siny cosy 0 
0 0 1 


are the rotations about the x, y, z-axes by the angles a, 8 and y, respectively. 

In general, the matrix that rotates R? with respect to a given axis appears 
frequently in many applications. One can easily express such a general 
rotation as a composition of basic rotations such as Riza); Riy,g) and Riz): 

Suppose that the axis of a rotation is the line determined by the vector 
u = (cos a cos Ø, cos asin A, sina), —5 <a < 5,0 < $ < 2r in the spherical 
coordinates. To find the matrix R(u p) of the rotation about the u-axis by 
0, we first rotate the u-axis about the z-axis into the xz-plane by R(z,—6) 
and then into the z-axis by the rotation Ry, q) about the y-axis. Then 
the rotation about the u-axis is the same as the rotation about the x-axis 
followed by the inverses of the above rotations, t.e., take the rotation Riz 4) 
about the z-axis, and then get back to the rotation about the u-axis via 
Riya) and Ri, g). In summary, 


Rie) = Rip) P,a) Ria) Riy, -a) Riz,—p): 


Figure 3.12: A rotation about the u-axis 


Problem 3.25 Find the matrix Ru, 4) for the rotation about the line determined 
by u = (1,1, 1) by 7 


Remarks: An important transformation of R is a translation which is a 
transformation T : R — R defined by T(x) = x + a, for a fixed vector 
a € R. The translations of R? other than the identity transformation 
are not linear (called affine transformations), so that they have no matrix 
representations on R*. However, if we identify the vector space R? with a 


140 Chapter 3. Linear Transformations 


hyperplane in Rf, such a translation can be represented by a 4 x 4 matrix 
over Rt: Let 


X = {(a,y,2,1)€R* : (x,y,z) € R’} 
re 


be the hyperplane in R* that is the solutions of w = 1. Then the iden- 
tification of R with this hyperplane X is given by the correspondence 
x = (x,y,z) © (2,y,z,1) = (x, 1). Then any transformation on R? that is 
a composition of a linear transformation A and a translation by a can be 
represented by a 4 x 4 matrix: 


T(x) = Ax+a inR’, or 
x al x 
p|” m A a2 YEW Ax+a n Ri. 
Z a3 Z 
1 0 0 0 1 1 1 


3.10 Exercises 


3.1. Which of the following transformations T are linear transformations? 
(1) T(z, y) = (2? =y’, 2? +y’). 


(2) T(a, y, z) = (£ +y, 0, 2z + 4z). 
(3) T(x, y) = (sinz, y). 

(4) T(z, y) = (£ +1, 2y, z +y). 

(5) T(z, y, 2) = (|z|, 0). 


3.2. Let T : Po (R) > P; (R) be a linear transformation such that T (1) = 1, T(x) = 
x?, and T(z”) = z? +2. Find T(az? + br +c). 
3.3. Find So T and/or T oS whenever it is defined. 
(1) T(z, Y, z) z= (z-y+z, x +z), S(x, y) = (z, z -— Y, y); 
(2) T(z, y) = (x, 3y +z, 22 — 4y, y), S(x, Y, z) = (22, y). 
3.4. Let S :C(R) > C(R) be the transformation on the vector space C (R) defined 
by, for f € C(R), 


SHE = fe) — f uftu)du. 
1 
Show that S is a linear transformation on the vector space C(R). 


3.5. Let T be a linear transformation on a vector space V such that T? = Id and 
T Ald. Let U = {v € V : T(v) =v} and W ={veV:T(v) = —-v}. Show 
that 


3.10. 


3.6. 


3.7. 


3.8. 


3.9. 


Exercises 141 


) at least one of U and W is a nonzero subspace of V; 
(2) UNW = {0}; 
)V=U4W. 


If T : R — R is defined by T(z, y, z) = (22 — z, 3x — 2y, x — 2y + 2), 
(1) determine the null space N(T) of T, 
(2) determine whether T is one-to-one, 
(3) find a basis for N (T). 


Show that each of the following linear transformations T on R? is invertible, 
and find a formula for T71: 

(1) T(z, y, z) = (82, z — y, 2u+y+2). 

(2) T(z, y, z) = (Qa, 4x — y, 2x + 3y — z). 


Let S, T : V — V be linear transformations on a vector space V. 
(1) Show that if T o S is one-to-one, then T is an isomorphism. 
(2) Show that if T o S is onto, then T is an isomorphism. 
(3) Show that if T* is an isomorphism for some positive k, then T is an 
isomorphism. 


Let T be a linear transformation from R? to R?, and let S be a linear trans- 
formation from R? to R3. Prove that the composition S oT is not invertible. 


. Let T be a linear transformation on a vector space V satisfying T — T? = Id. 


Show that T is invertible. 


. Let T : RË — R? be the linear transformation given by T(2,y,z) = (x+y, y + 


z,£ +z). Let C denote the unit cube in R? determined by the standard basis 
€1,€2,e3. Find the volume of the image parallelepiped T(C) of C under T. 


. Let T : P3(IR) > P3(R) be the linear transformation defined by 


T f(x) = f" (x) — 4f (x) + fla). 


Find the matrix [T], for the basis a = {x, 1 +g, r+27, z3}. 


. Let T be the linear transformation on R? defined by T(z, y) = (—y, 2). 


(1) What is the matrix of T with respect to an ordered basis a = {v1, v2}, 
where vı = (1, 2), v2 = (1, —1)? 

(2) Show that for every real number c the linear transformation T — c Id 
is invertible. 


. Find the matrix representation of each of the following linear transformations 


T on R? with respect to the standard basis {e1, e2}. 


(1) T(x, y) = (2y, 3x — y). 
(2) T(z, y) = (3x — 4y, x + 5y). 


4 2 1 
tet M=| 4 1 i! 


142 


3.18. 
3.19. 


3.20. 


3.21. 


3.22. 


3.23. 


Chapter 3. Linear Transformations 


(1) Find the unique linear transformation T : RÈ — R? so that M is the 
associated matrix of T with respect to the bases 


aff h LTP (LU) 


(2) Find T(z, y, z). 


. Find the matrix representation of each of the following linear transformations 


T on P e with respect to the basis {1, x, 27}. 
z) > p(x +1). 


. Consider the following ordered bases of R3: a = {e1, e2, e3} the standard 


basis and 6 = {u; = (1, 1, 1), w = (1, 1, 0), us = (1, 0, 0)}. 
(1) Find the transition matrix P from a to £. 
(2) Find the transition matrix Q from £ to a. 
(3) Verify that Q = Pt. 
(4) Show that [v]g = P[v]q for any vector v € R?. 
(5) Show that [T]g = Q~'[T]aQ for the linear transformation T defined by 
T(x, y, z) = (2y + x, z — 4y, 32). 


There are no matrices A and B in Mnxn(R) such that AB — BA = In. 


Let T : R? > R? be the linear transformation defined by 
T(z, y, z) = (8a + 2y — 4z, z — 5y + 32), 
and let a = {(1, 1, 1), (1, 1, 0), (1, 0, 0)} and 8 = {(1, 3), (2, 5)} be 
bases for R? and R?, respectively. 
(1) Find the associated matrix [T]? for T. 
(2) Verify [T]2[v]a = [T (v)]a for any v € R. 


Find the transition matrix [Id]? from a to 6, when 

(1) a={(2,3), (0,1)}, 8 = {(6,4), (4,8)}; 

(2) a={(5,1), (1,2)}, 2 = į{(1,0), (0, 1)}; 

(3) a={(1,1,1), (1,1,0), (1,0,0)}, 6 = {(2,0,3), (—1,4, 1), (3,2,5)}; 
(4) a= {t, 1, ?}, 8 ={3+ 20+, 2? —4, 24+¢}. 


cos 6 sind 


Show that all matrices of the form . 
sinf —cosé 


| are similar. 


So 


Show that the matrix A = | 1 


ii | cannot be similar to a diagonal matrix. 


Are that the matrices 


=. Or 


2 5 -1 0 1 
1 6 | and 0 4 2 | similar? 
0 1 0 0 3 


3.10. Exercises 143 


3.24. For a linear transformation T on a vector space V, show that T is one-to-one 
if and only if its transpose T* is one-to-one. 


3.25. Let T : R? — R? be the linear transformation defined by 
T(z, yY, z) = (2y +z, =z + 4y +z, z +2). 
Compute [T]a and [T*] + for the standard basis a = {e1, e2, e3}. 


3.26. Let T be the linear transformation from R? into R? defined by 
T (a1, £2, £3) = (£1 + £2, 243 — z1). 
(1) For the standard ordered bases a and £ for R? and R? respectively, find 
the associated matrix for T with respect to the bases a and £. 
(2) Let a = {x1, X2, X3} and 8 = {y1, yo}, where xı = (1,0, —1), x2 = 
(1,1,1), x3 = (1,0,0), and yı = (0,1), yo = (1,0). Find the associated 
matrices [T] and [T*]%. 


3.27. Let T be the linear transformation from R? to Rt defined by 
T(z, y, z) = (2x +y + 4z, £ +y + 2z, y+ 2z, x +y +32). 
Find the image and the kernel of T. What is the dimension of Im(T)? Find 
[T] and [T*]3-, where 
a = {(1,0,0), (0,1,0), (0,0,1)} 
6 = {(1,0,0,0), (1,1,0,0), (1,1,1,0), (1,1,1,1)}. 


3.28. Let T be the linear transformation on V = R, for which the associated 
matrix with respect to the standard ordered basis is 


reg 


-1 3 4 
Find the bases for the kernel and the image of the transpose T* on V*. 


3.29. Define three linear functionals on the vector space V = P2(IR) by 


file) = fy p(2)de, fa(p) = fy plede, fs(p) = fy‘ pa)de. 
Show that {f1, fo, fz} is a basis for V* by finding its dual basis for V. 


3.30. Determine whether or not the following statements are true in general, and 
justify your answers. 

(1) For a linear transformation T : R” > R”, Ker(T) = {0} ifm >n. 

(2) For a linear transformation T : R” > R”, Ker(T) 4 {0} ifm <n. 

(3) A linear transformation T : R” — R™ is one-to-one if and only if 
the nullspace of [T]2 is {0}, for any bases a and 8 of R” and R” 
respectively. 

(4) For a linear transformation T on R”, the dimension of the image of T 
is equal to that of the row space of [IT], for any basis a for R”. 

(5) Any polynomial p(x) is linear if and only if the degree of p(x) is 1. 

(6) Let T : RÈ — R be a transformation given as T(x) = (Ti(x), T2(x)) 
for any x € R’. Then T is linear if and only if their coordinate functions 
T;, i = 1, 2, are linear. 


144 Chapter 3. Linear Transformations 


(7) For a linear transformation T : R” —> R”, if [7]? = I, for some bases 
a and 8 of R”, then T must be the identity transformation. 
(8) If a linear transformation T : R” — R” is one-to-one, then any matrix 
representation of T is nonsingular. 
(9) Any m x n matrix A can be a matrix representation of a linear trans- 
formation T : RP > R”. 
(10) det : Mnxn(R) > R is a linear functional. 


Chapter 4 


Determinants 


4.1 Basic properties of the determinants 


Our primary interest in Chapter 1 was in the solvability or solutions of 
a system Ax = b of linear equations. In the previous two chapters, we 
introduced the mathematical structures related to the equation Ax = b. 
For an invertible matrix A, Theorem 1.9 shows that the system has a 
unique solution x = A~!'b for any b. For the invertibility, we introduce 
the notion of determinant as a real-valued function of square matrices, and 
then show that a square matriz A is invertible if and only if the determinant 
of A is not zero. In fact, it was shown in Chapter 1 that a 2 x 2 matrix 


A= | : : | is invertible if and only if ad — bc 4 0. This number is called 
the determinant of A, written by det A, and is defined formally as follows: 


Definition 4.1 For a 2 x 2 matrix A = | : j 


minant of A is defined as det A = ad — bc. 


| E Məx2(R), the deter- 


Geometrically, it turns out that the determinant of a 2 x 2 matrix A 
represents, up to sign, the area of a parallelogram in the xy-plane whose 
edges are constructed by the row vectors of A (see Theorem 4.10). This 
notion of the determinant can be extended to higher order square matrices 
in such a way that the same geometric meaning as the 2 x 2 case remains 
true. However, the formula itself in Definition 4.1 does not provide any idea 
how to extend the determinant for higher order matrices. For this, we first 
examine some fundamental properties of the determinant function defined 
in Definition 4.1. 


145 


146 Chapter 4. Determinants 


By a direct computation, one can easily verify that the function det in 
Definition 4.1 satisfies the following three properties: 


Lemma 4.1 (1) det | : d zah, 


0 1 
c d a b 
(2) det | © aea 
ka+ la kb+ W | | a b a’ b 
(3) aet | i d |= rae] $ ARLIN a 
c d a b 
Proof: (2) aet| e J = bc—ad = -(ad-— bc) = -aet | ° aif 
/ / 
(3) act | EM oe | = wa + taa (kb + bb!)c 


= k(ad — bc) + £(a'd — b'c) 
a b a b' 
= kdor | ¢ | + eae] 5 al 


In Lemma 4.5, it will be shown that if a function f : Mox2(R) > R 
satisfies the properties (1)-(3) in Lemma 4.1, then it must be the function 
det defined in Definition 4.1, that is, f(A) = ad — bc. The properties (1)- 
(3) in Lemma 4.1 of the determinant on Məx2(R) enable us to define the 
determinant function for any n x n square matrix. 


Definition 4.2 A real-valued function f : Mnxn(R) > R on the space of 
all n x n square matrices is called a determinant if it satisfies the following 
three rules: 


(Dı) The value of f of the identity matrix is 1, i.e., f(n) = 1; 
(D2) the value of f changes sign if any two rows are interchanged; 
(D3) f is linear in the first row: that is, by definition, 


kr, + 4r rı ri 
ro ro r2 

f ZUA aaae 
Tn, Tn, Tn, 


where r;’s denote the row vectors [aji -+> ain] of a matrix. 


4.1. Basic properties of determinant 147 


Since all the row vectors of an n x n matrix A belong to the n-space 
R”, the rule (D3) means that, if we restrict the domain of f to the first row 
vectors in IR” and fix all the other row vectors, f is a linear transformation 
from R” into R. 

Lemma 4.1 shows that the det on 2 x 2 matrices satisfies the rules (D,)- 
(D3). In the next section, one can see that for each positive integer n there 
always exists a function f : Mnxn(R) —> R satisfying the three rules (D,)- 
(D3) and such a function is unique (existence and uniqueness). Therefore, 
we say ‘the’ determinant and designate it as ‘det’ for each order n. 

Let us first derive some direct consequences of the rules (D;)-(D3). 


Theorem 4.2 The determinant satisfies the following properties. 
(1) The determinant is linear in each row. 
(2) If A has either a zero row or two identical rows, then det(A) = 0. 


(3) The elementary row operation that adds a constant multiple of one row 
to another row leaves the determinant unchanged. 


Proof: (1) Any row can be placed in the first row by row interchanging 
with a change of sign in the determinant by rule (D2), and then use the 
linearity rule (D3) and then (D2) again with interchanging the same rows. 
(2) The first statement is a direct consequence of (1) and Theorem 3.1. 
Since interchanging those two identical rows does not change the matrix itself 
but det(A) = — det(A) by the rule (D2), the second statement follows. 
(3) By a direct computation using (1), one can get 


ri + kr; ri rj 
det : =det | : | +kdet} : |, 


rj rj rj 


in which the second term on the right side is zero by (2). 


The determinant function is said to be alternating due to the rule (D2), 
and multilinear due to the property (1) in Theorem 4.2. 

It is now easy to see the effect of elementary row operations on evalua- 
tions of the determinant: An elementary row operation in (E1) that ‘mul- 
tiplies a constant k to a row’ changes the determinant to k times the de- 
terminant, by Theorem 4.2(1). An elementary row operation in (E2) that 


148 Chapter 4. Determinants 


‘interchanges two rows’ changes the determinant by sign due to the rule 

(D2). An elementary row operation in (E3) that ‘adds a constant multiple 

of a row to another’ does not change the determinant by Theorem 4.2(3). 
From this observations, it is clear that, for an elementary matrix E, 


kif E is in (E)) 
det(Z)= 2 —1 if E is in (E2) 
1 if E is in (Es). 


Therefore, for any elementary matrix and any matrix A, 


det(EA) = k det(A) = det(£) det(A), 


where k is a nonzero constant, or +1. 


Example 4.1 Consider a matrix 


1 
A= a 
b4 


all 


1 
Pe oe 


C C 


If one adds the second row to the third, then the third row becomes 


la+b+c a+b+c a+b+cç], 


which is a scalar multiple of the first row. Thus, det(A) = 0. 


Problem 4.1 Show that, for an n x n matrix A and k € R, det(kA) = k” det(A). 


Problem 4.2 Explain why det A = 0 for 


es: a+4 a+7 ie at at 
(IjJA=}a+2 at+5 at+8], (QA ae oa a 
Loses a+6 a+9 a a® a 


Further basic properties of the determinant are obtained in the following 
theorem. Recall that any square matrix can be transformed into an upper 
triangular matrix by forward elimination. 


Theorem 4.3 The determinant satisfies the following properties. 
(1) For a triangular matriz A, det(A) = a11 -` ann- 
(2) A matrix A is invertible if and only if det(A) 4 0. 
(3) For any two n x n matrices A and B, det(AB) = det(A) det(B). 
(4) det(A”) = det(A). 


4.1. Basic properties of determinant 149 


Proof: (1) If A is a diagonal matrix, then it is clear that det(A) = a11 --- ann 
by the multilinearity in Theorem 4.2(1) and rule (D1). Suppose that A is 
a lower triangular matrix. If A has a zero diagonal entry, then a forward 
elimination, which does not change the determinant, produces a zero row. 
If A does not have a zero diagonal entry, a forward elimination makes A row 
equivalent to the diagonal matrix D whose diagonal entries are unchanged. 
Thus, in the former case, det(A) = 0 and the product of the diagonal entries 
is also zero. In the latter case, det(A) = det(D) = a11 ++- ann. Similar argu- 
ments can be applied to an upper triangular matrix via back substitutions. 

(2) A square matrix A is row equivalent to an upper triangular matrix 
U through a forward elimination: that is, A = PLU for some permutation 
matrix P and an invertible matrix L that is a product of 3-rd kind elementary 
matrices. Thus det(A) = +det(U), and the invertibility of U and A are 
equivalent. However, U is invertible if and only if U has no zero diagonal 
entry by Corollary 1.10, which is equivalent to det(U) 4 0 by (1). 

(3) If A is not invertible, then neither is AB, and so det( AB) = 0 = 
det(A) det(B). If A is invertible, it can be written as a product of elementary 
matrices by Theorem 1.9, say A = FE, By--- Eg. Then by induction on k, 


det (AB) = det E Es- -- Eg B) 
Ei) det(E2) --- det(£,) det(B) 
E E2- - - Ex) det(B) 


(4) Clearly, A is not invertible if and only if AT is not. Thus, for a 
singular matrix A we have det(A’) = 0 = det(A). If A is invertible, then 
write it again as a product of elementary matrices, say A = E Ea- Ex. 
But, det(#) = det( ET) for any elementary matrix E. In fact, if E is a (E2) 
elementary matrix, then det(E’) = —1 = det (E) by (D2), and the other 2 
kinds elementary matrices are triangular, so that det(E) = det(E’). Hence, 
we have by (3) 


det(A’) = det((E,Ea--- E,)*) 
= det(E}.. ‘af ET) 
= det(EF)---det(E2) det(ET) 
= det(£,) det(E2)---det( Ep) 
= det(E Fə- -- Ex) 
= det(A). 


150 Chapter 4. Determinants 


Remark: From the equality det(A) = det(A’), one could define the de- 
terminant in terms of columns instead of rows in Definition 4.2, and Theo- 
rem 4.2 is also true with ‘columns’ instead of ‘rows’. 


Example 4.2 Evaluate the determinant of 


2-4 0 0 
1 —3 0 1 
a 1 0 -l 2 
3 —4 3 -1 


Solution: By using forward elimination, A can be transformed to an upper 
triangular matrix U. Since the forward elimination does not change the 
determinant, the determinant of A is simply the product of the diagonal 
entries of U: 


2-4 0 0 
det(A) = det(U) = det : = E: = 2. (—1)? - 13 = 26 
0 0 0 13 


Problem 4.3 Prove that if A is invertible, then det(A~!) = 1/ det(A). 
Problem 4.4 Evaluate the determinant of each of the following matrices: 
2 438 

fee ee 
ge 5 ; » (2) 31 32 33 34 2o r? 2 1 zxz 
| at 42 43 44 ke r? xr? 1 


to 


4.2 Existence and uniqueness of the determinant 


Throughout this section, we prove the following fundamental theorem for 
the determinant. 


Theorem 4.4 (Existence and Uniqueness) For any natural number n, 
there exists a unique determinant function det : Mnxn(R) —> R that satisfies 
the three rules (D,)-(D3) in Definition 4.2. 


4.2. Existence and uniqueness of the determinant 151 


Let us investigate this theorem inductively on n. For n = 1, it is quite 
clear that any determinant function must be det|a] = a. To find an explicit 
formula of the determinant function for n > 1, we investigate the three rules 
of a determinant function for n = 2 and 3. 

For 2 x 2 matrices: When n = 2, the existence theorem comes from 
Lemma 4.1. The next lemma shows that any function f : M2x2(R) > R 
satisfying the three rules (D, )-(D3) must be the det in Definition 4.1, which 
implies the uniqueness of the determinant function on Məx2(R). 


Lemma 4.5 If a function f : Mox2(R) > R satisfies the rules (D;)-(D3), 
then f | i | =ad—be. That is, f(A) = det A. 

0 1 
1 0 


= ae a 


Proof: First, note that f | | = —1 by the rules (D;) and (Dz). 


ra = | 
a 
Š E 
0 a 0 0 b 0 b 
= sfo aler[e off a)ee[e a] 
= ad+0+0-—bc = ad — bc, 


where the third and forth equalities come from the multilinearity (1) in 
Theorem 4.2. 


a 
Cc 
a 


For 3 x 3 matrices: Let f : M3 3(IR) —> R be a function satisfying the 
three rules (D;)-(D3). By the same calculation as in the proof of Lemma 4.5 
for n = 2, i.e., by repeated use of the three rules (D;)-(D3), one can derive 
an explicit formula for f(A) of a matrix A = [a;j] in M3,3(IR) as follows: 


ail Q12 413 | 
f | a2 a2 az 
a31 432 433 | 


Q11 0 0 | 0 a12 0 | 0 0 Q13 | 
Sof 0 ag 0 i+f 0 0 az {+f | a2 0 0 
0 0 a33 | az) 0 0 | 0 az 0 | 
Q11 0 0 0 a12 0 | 0 0 413 
+f 0 0 az {+f | aa 0 0 i+f 0 a 0 
0 az 0 | 0 0 a33 | az) 0 0 | 


= 411422433 + 412023431 + 413421432 — 411423432 — 412421433 — 413022431. 


152 Chapter 4. | Determinants 


That is, for any real-valued function f : M3 3(IR) > R satisfying the 
rules (D,)-(D3), must be f(A) must be the sum of 6 terms as in the 
above formula. This proves the uniqueness of the determinant function on 
M3x3(R). On the other hand, one can easily show that the explicit formula: 


F(A) = Do tari, 42%, 0355, 


with {71, 72,13} = {1,2,3} for a matrix A € M3,3(R), satisfies the three 
rules, which proves the existence when n = 3. Therefore, for n = 3, the 
formula shows both the uniqueness and the existence of the determinant 
function on M3,.3(R). 


Problem 4.5 Show that the given explicit formula of the determinant for 3 x 3 
matrices satisfies the three rules (D,)-(D3). 


Remark: The explicit formula for the determinant of a 3 x 3 matrix can 
easily be memorized by the following scheme. Copy the first two columns 
and put them on the right of the matrix, and compute the determinant by 
multiplying entries on six diagonals with a + sign or a — sign as in Figure 4.1: 
(This is known as Sarrus’s method for 3 x 3 matrices. It has no analogue 
for matrices of higher order n > 4.) 


Ua U42 Qag | eil „i2 
a21 Q% Q |1821 422 
gasi a33 asi |“a31 “a32 


Figure 4.1: Sarrus’s Method 


Problem 4.6 Evaluate the determinants of the following matrices by Sarrus’s 


method. 142 4 —2 2 
()A= 3o Ty (2)A= 1 3 -!l 
—2 2 3 —2 6 4 


To find a general formula for n > 3, we look at the equation in more 
detail: The first step is to apply the multilinearity to each row separately 
and get a decomposition of the form: 


| aj i2 az | = | añ 0 0]+[0 i2 0]+[0 0 ai | - 


Thus f(A) is a sum of 3° = 27 determinants, and each row of one of the 33 = 
27 determinants has only one entry from A. If two rows in one determinant 


4.2. Existence and uniqueness of the determinant 153 


are parallel: i.e., two entries from A are in the same column, then the 
determinant is zero. That is, only those determinants in which the 3 entries 
aij from A, one in each row, are in different columns need not be vanished. 
There are 3! such terms since there are 3! ways of arranging three numbers 
in three different column positions. Thus, f(A) is reduced to the sum of 
3! = 6 simpler determinants, and we get the first equality. 

The second equality is just the computation of the six determinants by 
using the rules (D2) and (1) of Theorem 4.3: A suitable ‘column inter- 
changes’ to convert it to a diagonal matrix makes + sign. Thus the determi- 
nant of each of them is just the product of the 3 entries with + sign which 
is determined by the number of column interchanges. 


For matrices of higher order n > 3: Again, we repeat the same 
procedure as for the case of n = 3. Let f : Mnxn(IR) —> R be a function 
satisfying the three rules (D;)-(D3). 

(Step 1) First, we apply the multilinearity to each row of A to expand 
f(A) as the sum of the n” determinants, each of which has just n entries 
from A, one in each row. But if two entries from A in two rows are in the 
same column (i.e., the two rows are parallel), the determinant of the matrix 
is zero. 

(Step 2) Therefore, in each one of the matrices remained, those n entries 
from A are of the form 


Gli, G27, A3k, +--+; Ang 
with some column indices 2, j, k, ... , £, which is just a rearrangement of 1, 
2,..., n without repetitions or omissions. Such a rearrangement is called a 


permutation of the n numbers, which will be discussed below. Since there 
are exactly n! permutations, the number of the determinants remained in 
the expansion of f(A) is nl. 

(Step 3) It is easy to see that the determinant of each of the n! ma- 
trices remained is just the product ajja2;---@ne of the n entries with ‘+ 
sign, which will be determined by the parity of the number of the column 
interchanges that are necessary to make it a diagonal matrix. 

The parity of the number of the column interchanges can be determined 
in a mathematical terminology as follows: 


Definition 4.3 A permutation of n objects is a one-to-one function from 
the set of n objects onto itself. 


In most case, we denote the set of n integers by Nn = {1, 2, ..., n}. A 
permutation o of N, assigns a number o(7) in N, to each number i in Np, 


154 Chapter 4. | Determinants 


and this permutation o is usually denoted by 


Here, the first row is the usual lay-out of the elements of N,, as the domain 
set, and the second row is the image set showing an arrangement of the 
numbers in N,, without repetitions or omissions. 

If Sn denotes the set of all permutations of Nn, then one can easily see 
that Sn has exactly n! permutations (Here n! = n(n — 1)---2-1, called 
n—factorial). For example, S2 has 2! = 2, S3 has 3! = 6, and S4 has 4! = 24 
permutations. In our case, those n entries of a matrix in the expansion of 
f(A) can be written as 


Qio(1); 420(2); 43a(3)s ++; no(n) 


for some permutation o. The number of the column interchanges necessary 
to convert it to a diagonal matrix is the same as the number of interchanges 
that convert the numbers (o(1),o(2),..., o(n)) into the order (1,2,..., n). 


Example 4.3 Suppose that one of the 6 matrices is of the form: 


0 0 ag 
0 a22 0 
azı 0 0 


Then the determinant of this matrix is 


0 0 413 | 413 0 0 
f| 0 a2 0 | =-f]| 0 aga 0 | = —ai3a22031. 
azı 0 0 | 0 0 az | 


Even though there may be several ways of column interchanges to convert 
the given matrix to a diagonal matrix, one can see that the parity (even or 
odd) of the numbers of column interchanges are the same. In our example, 
the rearrangement of the column indices of this matrix corresponds to the 
permutation o = (3,2,1), and the column interchange is the same as an 
interchange of this column indices. From the given arrangement of the 
column indices 3, 2, 1, one can take either just one interchange of 3 and 
1, or three interchanges: 3 and 2, 3 and 1, and then 2 and 1 to arrive at 
the order 1, 2, 3, which represents the diagonal matrix. In either case, the 


parity is odd so that the “—” sign in the determinant. 


4.2. Existence and uniqueness of the determinant 155 


Definition 4.4 A permutation o = (ji, jo, .--, jn) is said to have an 
inversion if js > ją for s < t (i.e., a larger number precedes a smaller 
number). 


For example, the permutation o = (3,1,5,4,2) has five inversions, since 
3 precedes 1 and 2; 5 precedes 4 and 2; and 4 precedes 2. Note that the 
identity (1,2,...,) is the only one without inversions. 

If column indices of the entries of a matrix are given as a permutation o, 
then the number of column interchanges necessary to convert it a diagonal 
matrix is the same as the number of interchanges of numbers in ø necessary 
to make it identity. But this is the same as the number of inversions in 
o, and the sign of a determinant is determined only by the parity of the 
number of column interchanges, which is independent of the choice of the 
methods. 


Definition 4.5 A permutation is said to be even (or odd) if it has an even 
(or odd, respectively) number of inversions. For a permutation ø in Sn, the 
sign of ø is defined as 


E 1 ifø is an even permutation | _ (-1) 
Sno) =) _1 ifo is an odd permutation f — : 


where k is the number of inversions of ø. 


For example, when n = 3, the permutations (1, 2, 3), (2, 3, 1) and 
(3, 1, 2) are even, while the permutations (1, 3, 2), (2, 1, 3) and (3, 2, 1) 
are odd. 

Now, the determinants of those remaining n! matrices are of the form: 
£01 ¢(1)@20(2)*** @no(n) for a permutation o. The sign + is the parity of the 
number of column interchanges, that is, the sign of the permutation o, which 
is (—1)* where k is the number of inversions in ø. Thus, its determinant is 
equal to 


sgn(o)@45(1)@20(2) *** no(n): 
This is called a signed elementary product of A. 
Finally, our discussions can be summarized as follows to get an explicit 
formula for det A: 


Theorem 4.6 If f : Mnxn(IR) > R is a determinant function, then, for 
any n x n matriz A, 


f(A) = >D sgn (a )a49(1) @20(2) t Ana(n): 
TESn 


That is, f(A) is the sum of all n! signed elementary products of A. 


156 Chapter 4. Determinants 


This shows that the determinant must be unique if it exists. On the 
other hand, if a function f : Mnxn(R) —> R is defined by the formula in 
Theorem 4.6, then it can be easily shown that this formula of f satisfies the 
three rules (D,)-(D3). Therefore, we have both existence and uniqueness for 
the determinant function of square matrices of any order n > 1, which proves 
Theorem 4.4. For all n, we use det for the unique determinant function f. 


4.3 Cofactor expansion 


Even if one has found an explicit formula for the determinant as shown in 
Theorem 4.6, it is not much helpful in real computation because one has to 
sum up n! terms, which becomes very large number as n gets large. The 
computation time may be reduced if we reformulate it in an inductive way 
as follows: 

The first factor a1ṣ(1) in each of the n! terms is one of n entries a11, @12,.--, 
ain in the first row of A. Hence, one can divide the n! terms of the expansion 
of det A into n groups according to the value of o(1): Say, 


detA = y sgn (7 )@49(1) @20(2) t Ang(n) 
TESn 
= anA + aA +++: + a1nAin, 


where Ajj, for j = 1,2,--- ,n, is defined as 


Aij = D sgn(o)az0(2) t no(n): 
FESn Oj 
It turns out that Ajj, for each j = 1,...,n, is the determinants, upto a + 


sign, of the submatrix Mı; of A with the 1-st row and the j-th column in A 
deleted. This (n — 1) x (n — 1) submatrix Mj; is called the minor of aj;. 
In fact, observe that in each sum of (n — 1)! terms: 


aij A1j = aij `> sgn(o)azo(2) *** Ano(n)» 
oESn,o(1)=5 


all the (n — 1)! permutations ø take 1 to j and permutes the rest numbers 


1,...,j,-...,m, where 7 means that 7 is deleted. Thus the number of inver- 
sions in ø is j — 1, which is caused by placing j at the position 1, plus the 
number of inversions in the permutation on 1,...,j,...,n: ie., if we write 
o = (o(1)(= j), T(2), ..., T(n))}, then 


sgn(o) = (—1)}'sgn(7) 


4.3. Cofactor expansion 157 
where T = (o(2), ..., a(n)) is a permutation on N,, — {j = o(1)}. But then 


XO sgn(1)a27(2)437(3) *** @nr(n) = det Mij 
TESn—1 


is simply the determinant of the minor Mı;, so that 


Aij = `> sgn(7)A29(2) *** ano(n) 
gESn,o(1)=J 
= (-1))! X` sgn(r)a29(2)a30(3) *** @no(n) 
TESn_-1 


= (-1))? det Mij, 
which is called the cofactor of the entry aij. 


Problem 4.7 Let A = [c; -++ Cn] be an n xn matrix with the column vectors c,’s. 
Show that det[ej c1 «++ Cj-1 Cj+1 +++ Cn] = (—1)41 det[e; = cj +++ €n]. Note that 
the same kind of equality holds when A is written in row vectors. 


In summary, one gets an expansion of det A with respect to the first row: 


det A = ayi + @12A12 +: + @inAin 
= ay; det My, — ayo det Mio +--+ + (—1)! tain det Min. 


This is called the cofactor expansion of det A along the first row. 

There is a similar expansion with respect to any other row, say the i-th 
row. To show this, first construct a new matrix A from A by shifting the 
i-th row of A up to the first row and moving its preceding i — 1 rows down 
one by one. Then clearly det A = (—1)’~! det A by the rule (D2). Now, the 
expansion of det A with respect to the first row [aj] +*+ ain] is 


det A = aj Ait F ai2 A12 Saat ate ain Ain; 
where Aj; = (—1)-'det Mij. But Mı; = Mij, which is, by definition, the 
minor of aij. Therefore, if we set 
Aij = (-1) 1A; = (ay det Mij, 
which is called the cofactor of a;;, then 
det A = aj, Aj + aj Ajo +--+ + ajn Ain. 


One can do the same with the column vectors because det AT = det A. 
This gives the following theorem: 


158 Chapter 4. Determinants 


Theorem 4.7 Let A be ann x n matriz. Then, 
(1) for each1 < i<n, 
det A = aj, Aj, + aiz Aiz +--+ + Gin Ain, 
called the cofactor expansion of det A along the i-th row. 
(2) For each1 < j<n, 
det A = a,j Ajj + a2; Aoj +--+ + GnjAnj, 


called the cofactor expansion of det A along the j-th column. 


This cofactor expansion is the usual way of defining the determinant 
inductively on n. 


Remark: The sign (—1)’*/ of the cofactors in the cofactor expansion in 
any row or any column can be expanded alternatively as follows: 


T = + ra (—1)i+” 
es zH = ack (-1)2t" 
T = + acs (-1)3*" 
(yn (—1)"+2 (-1)"+3 3 : (-1)etn 


Therefore, the determinant of an n x n matrix A is the sum of the products 
of the entries in any one row (or, any one column) with their cofactors. 


Example 4.4 Let 


— p E a a E as 
Ay, = (-1) TEE = 5.9 8-6= 3, 
Ap = (1 act | 7 ‘| = (-1)(4-9-7-6) =6, 
Ai = (1 ae | 7 4 = 4-8-7-5=-3, 


respectively. Hence the expansion of det A along the first row is 
det A = a1, Aq, + @12Aj2 + a13 A13 = 1- ( 3) 2.64 3-( 3) =0. 


4.3. Cofactor expansion 159 


Theorem 4.7 suggests that, in the cofactor expansion of det A along a 
row or a column, the evaluation of Aj; can be avoided whenever aij = 0. 
Therefore, the computation of the determinant will be simplified by making 
the cofactor expansion along a row or a column that contains as many zero 
entries as possible. For example, a forward elimination to a square matrix 
A will produce an upper triangular matrix U, and so the determinant of A 
will be just the product of the pivots in U up to the sign caused by possible 
row interchanges. The next examples illustrate this method. 


Example 4.5 Evaluate the determinant of 


1 —1 21 
—3 4 1 —1 
2 —5 —3 8 
=2 6 —4 1 


Solution: By a simple computation of the elementary operations, we have 


det A = det = det | -3 —7 10 
0 -—3 -7 10 4 Daci 
0 4 0 —1 


Similarly, apply the elementary operations to the second column, and take 
the cofactor expansion along the second column to get 


1 T —4 1 7 —4 
det | -3 —7 10 = det | -2 0 6 
4 0 -1 4 0 -l 
—2 6 
— (_1)\142.7. 
= (-!1) 7 det | 4 oi 


—7(2— 24) = 154. 


Problem 4.8 Use cofactor expansions along a row or a column to evaluate the 
determinants of the following matrices: 


0111 
(1) A= 


Now 
oH OF 
Dwn 


0 
2 
2 


N Ohm 
Omme 


160 Chapter 4. Determinants 


Example 4.6 Let A be a 4 x 4 square matrix given below: 


w ww w 


x 
y 
z 


N 
w 


wo wW 


Show that det A = (x — y) (x — z)(x — w) (y — z) (y — w)(z — w). 


Solution: Use Gaussian elimination: 


1 £ r? x 
0 y-r y? — r? yp —a 
0 z=r 2-2? 2-23 


0 w-a w- r? w- r’ 


det A = det 


y-r y-r? yeaa?) 
= det | z—-a2 22-2? 23-23 

w-x w -— r? S 
2 


2 


y+r y+ay+e 
FAm a be z2 LZ £ 


w+ w? + rw + x? 


= (y—x)(z—2)(w —2) det 


z—y (z—y)(z+y+2) 
wy (w—y)(w+yt+z) 
(z—y)(zt+y+2) | 
(w —y)(w +y +z) 

1 ard 
1 wtytz2 


1 
1 
1 
1 pee yY? + ry +r? 
0 
0 


Problem 4.9 In general, a matrix A of the following form is called the Vander- 
monde matrix of order n: (see Example 4.6) 


Loa, g o g! 
-1 

l1 z2 z 0e 7 

A= ; i 
2 n—1 

Loti ay Ly 


Show that 


4.3. Cofactor expansion 161 


Problem 4.10 For n > 1, show that 


O a, ta > Tn 
Tı Qil Q12 ‘° Gin n n 
tə a a © a = ; 
det 2 21 22 Inv || Pe ` AijtiZj, 
: : : : j=1 j=1 
In Ani An2 ‘*' Ann 


where A;; is the cofactor of aij in n x n submatrix A = [aj;]- 
(Hint: Find the cofactor expansion in the first row and then compute the 
cofactor expansion in the first column in each cofactor of xj.) 


Problem 4.11 A tridiagonal matrix A is an n x n matrix A of the following 
form: 


at by 0 sate 0 0 0 
Cy a be 0 0 0 
T,= 0 & a3 - 0 0 0 
0 0 O +++ Cn—2 Qn- bn- 
0 0 0 >œ 0 Cn—-1 An 


Let Dk = det Tk for k > 1, and Do = 
(1) Show that Dk = apDg—1 — bk-1Ck-1Dk-1 for k > 2, by induction. 


(2) If a; = bj = ck = b > 0, then Dn = bDn-1 — b? Dno for n > 3. Show that 


1 
Dy, = b” (cos 2T y sin Z 


3 A3 3 
(Hint: Section 6.3.1 for recurrence relations.) 


(3) For bj = 1 and c = —1, let (a1 ... an) = det Tn. Show that 


(ar... an) 1 
ire] E T 
ag + 
a3 + 


1 
+ —— 
an-it az 


(Hint: By induction on n, it is good enough to show that 


‘ade ag Marga sae) 
(a2a3...An) (a2a3 P an) 


162 Chapter 4. | Determinants 


Problem 4.12 A square matrix A of the following form is called a circulant 
matrix: 


ay a2 Qg ->e an 

an Qai G2 "** Gn-1 
A= | G@-1 Gn Qi ‘`?  Gn—2 

a2 a3 @4 7"" ay 


Show by induction on n > 2 that 


n—1 
det A = II (a1 + Gow; + az; Fesa anw"), 
j=0 


where w = e?7/", į = yT], is the primitive n—th root of unity and wj = wf for 
Pi esmai. 


Problem 4.13 Evaluate the determinant of 


1 21 1 1 0 a b c 
aae a anela a s 
zi yt z2 w 7 —b -d 0 f 
zt yt zt wt —c -e -f 0 


4.4 Applications 


4.4.1 Inverse matrices and Cramer’s rule 


In Chapter 1, we have studied two ways for solving a system of linear equa- 
tions Ax = b: (i) by Gauss-Jordan elimination (or by LDU factorization) 
or (ii) by using A`! if A is invertible. The cofactor expansion of the deter- 
minant may be used to compute the inverse of an invertible matrix A. 

For i Æ j, let A* be the matrix A with the j-th row replaced by the 2-th 
row. Then the determinant of A* must be zero, and the cofactors of A* with 
respect to the j-th row are the same as those of A: that is, AX, = Aj, for 
allk =1, ..., n. Therefore, we have 


x x x x 
0 = det A = ai Aj, +r ai2 A32 ata Gin Ajn 
= aj Aj, + aj Ajo + +++ + Gin Ajn. 


This proves the following lemma. 


4.4. Applications 163 


Lemma 4.8 

det A ifi=j 

0 ifi Æj. 
Definition 4.6 If A is an n x n matrix and Aj; is the cofactor of a;j, then 
the new matrix 


ai Aji + ai2Aj2 + +++ + ainAjn = { 


Ay; Agi © Ant 

f Ai? An = Ang 
adjA = : y i 

Ain Aan sonst Ann 


is called the adjoint of A. 


It follows from Lemma 4.8 that 


det A 0 e’ 0 
0 det A 
A-adjA = i . = (det A)I. 
0 0 -> detA 


If A is invertible, then we have A (ata adjA) =I. Thus 


1 
Al => adjA, and A = (det A) adj(4"') 
e 


by replacing A with A}. 


Example 4.7 For a matrix A = | : : | , adjA = | ie a | , and if 
det A = ad — bc Æ 0, then 


1 d —b 
Al= . 
ad — bc | —c A 


1 3 1 
Problem 4.14 Compute adjA and A~! for A = | 2 1 1 | F 
2 -2 1 


Problem 4.15 Show that A is invertible if and only if adjA is invertible, and that 
if A is invertible, then 


(adj.A)~! adj(A7+). 


— A —- 
~ det A 


Problem 4.16 Let A be an n X n matrix with n > 1. Show that 


164 Chapter 4. | Determinants 


(1) det(adjA) 
(2) adj(adjA) 


= (det A)”"?}; 
= (det A)””?A, if A is invertible. 

The following formula may be not useful for a practical computing but 
could be a tool in theoretical arguments. 


Theorem 4.9 (Cramer’s rule) Let Ax = b be a system of n linear equa- 
tions in n unknowns such that det A # 0. Then the system has the unique 
solution given by 


= det Cj 
Ti = “det A’ 


j=l, 2,..., N, 


where Cj is the matrix obtained from A by replacing the j-th column of A 
by the column matriz b = [by by «++ by |". 


Proof: If det A # 0, then A is invertible and x = A~'b is the unique 
solution of Ax = b. Since 


1 
ee ade, Coe b 
x=A b= Fer A @t4)b. 
it follows that 
i bj Aq; b2 Aaj ean bn Anj n det Cj 
: det A det A ` 


Example 4.8 Use Cramer’s rule to solve 


xı + 22% + «3 = 50 
221 air 2x9 =e t3 = 60 
ti T 222 ele 323 = 90. 
Solution: 
1.2- 1 
A = 2214], 
1 2 3 
50 2 1 50 1 1 2 50 
Ci, = | 60 2 n OG = 2 60 1], C3= 2 60 
90 2 1 90 3 1 2 90 
Therefore, 
det Ci det Cy det C3 
T= “Get A eee) > TT “Get A 


4.4. Applications 165 


Cramer’s rule provides a convenient method for writing down the solu- 
tion of a system of n linear equations in n unknowns in terms of determi- 
nants. To find the solution, however, one must evaluate n+1 determinants of 
order n. Evaluating even two of these determinants generally involves more 
computations than solving the system by using Gauss-Jordan elimination. 


Problem 4.17 Use Cramer’s rule to solve the systems 


4tg + 323 = -2 
(1) 321 + An + 523 = 6 
—27, + 522 — 273 = 1. 
2 3 5 
- - - + - = 3 
£ yY z 
4 7 2 
(2) —-- + - + - = 0 
£ yY 2 
2 1 
- - = = 2. 
yY z 


Problem 4.18 Let A be the matrix obtained from the identity matrix J, with 
i-th column replaced by the column vector x = [#1 +++ £n]”. Compute det A. 


4.4.2 Area and volume 


In this section, we demonstrate that a geometrical meaning of the deter- 
minant of a square matrix A is the volume (or area for n = 2) of the 
parallelepiped spanned by the row (or column) vectors of A. For this, we 
restrict our attention to the case of n = 2 or 3, even if the same argument 
can be applied for n > 3. 

For an n x n square matrix A, the rows r; = [aj +++ Gin], i = 1,2,...,7, 
of A are vectors in R”, and so they may constitute the edges of a paral- 
lelepiped: 


P(A) = {Soa :0<4<1, iest 
i=1 


It is called a parallelogram if n = 2. Note that a different order of the row 
vectors does not alter the shape of P(A). 


Theorem 4.10 The determinant det A of an nx n matrix A is the volume 
of P(A) up to sign. In fact, the volume of P(A) is equal to | det A |. 


166 Chapter 4. Determinants 


Proof: We give a proof for the case n = 2 only, and leave the case n = 3 
for an exercise. For any basis a = {r1, ro} for R?, we set A = | d l 
2 
and let Area(A) denote the area of the parallelogram P(A). Note that 
Area | A | = Area | ne |: On the other hand, det | F | = — det | ie |. 
r2 rı rı T2 
Thus, one can expect in general that 


det | =l | = +Area | i | 5 
ro r2 


which explains why we say ‘up to sign’. To prove this, we first define the 
orientation of a to be 


na) =| sey = +1 if det A Æ 0, 


1 if det A= 0. 


For example, p | Ai | = 1 while p | | Sh 
2 1 


In general, when the rows are independent, p(A) = 1 if and only if 
det A > 0: in this case, we say a is positively oriented, and p(A) = —1 if 
and only if det A < 0: in this case, we say a is negatively oriented. 


r2 rı 


N = 


rı 


r2 


Figure 4.2: Orientation of vectors 


Theorem 4.10 is proved if we show det A = p(A)Area(A). For this, we 
define 
D(A) = p(A)Area(A), 


and prove that the function D satisfies the rules (D1) - (D3) of the deter- 
minant. Then by the uniqueness of the determinant, det = D = +Area, or 
Area(A) = | det Aj. 


4.4. Applications 167 


(1) It is quite clear that D | a | =i. 


01 
Gal 2 | =-0] k 


This is clear, since p ie —p "1 | and Area 
rı r 


oI 
Ss 
— 
L 
I 
D 
ar 
ia) 
Q 
ae 
Lae aa | 
No E 
——— 


2 
@) |!) =a © 
r2 r2 
Indeed, if k = 0, it is trivially true. Suppose k Æ 0. 
y 
T2 
Arı 
rı 
z 


Figure 4.3: The parallelogram P(krı1, r2) 


Then, from Figure 4.3, the bottom edge rı of P(A) is elongated by |k| 
while the height h remains unchanged. Thus 


Area | kri | = |k| Area | D | : 
ro r2 


On the other hand, since 


ale = 2] s. 2 -Efn | 
= lic ie Tl) te, onc a il Te ’ 
a act |] rila | E |] ey Le 
r2 r2 
p|] = o| i Jee] 2 
2 r2 


| 
=|= 
d 
ar Í 
SOR 
No a 
—— 
E 
D 
S 
Eo 
a ao 
| 
—— 
II 
S 
| 
La ae 
N e 
—— 


apf =e | =p] "| +p] |, torau, r i= 1,2, ink, 
u u u 


If u = 0, there is nothing to prove. Assume that u Æ 0. Choose any 
vector v € R? such that {u,v} is a basis for R?. Then r; = aju + bjv, i = 


168 Chapter 4. Determinants 


1, 2, and 


u 


heey v 
|s (bi +b2)D = 


i 
i 

_ ope 
|" 


a, +a2)u + (bı + b2)v | 


= D 


The second equality follows from (2) and the following picture: 


Figure 4.4: pee Oh 
u u 


Remark: (1) For n > 3, the volume of P(A) can be defined by induction 
on n, and exactly the same argument in the proof can be applied to show 
that the volume is the determinant. 

(2) Note that if we have constructed the parallelepiped P(A) using the 
column vectors of A, then the shape of the parallelepiped is totally different 
from the one constructed using the row vectors. However, det A = det A’ 
means their volumes are the same, which is a totally non-trivial fact. 

(3) For an m x n matrix A with rank n, let {c1,¢c2,...,¢n} be the n 
column vectors of A. They constitute an n-dimensional parallelepiped in 


R”: m 
-fSte :0<4<1, iamh: 
i=1 


A formula for the volume of this parallelepiped may also be expressed by a 
determinant: 


4.4. Applications 169 


We demonstrate this for a two-dimensional parallelepiped (a parallelo- 
gram) determined by two column vectors cı and cz of A = [c] c2] in R. 


Figure 4.5: A parallelogram in R 


The area of this parallelogram is simply Area(P(A)) = ||cı||k, where 
h = ||c2|| sin and @ is the angle between cı and c2. Therefore, we have 


Area(P(A))* = |lei||*|Je2||? sin? @ = |Ie1||?|Je2||?(1 — cos? 0) 
a (c1 - c2)? 
= (c< €1)(C2- C2) (1 — ENE EG —) 


= (cy - C1) (C2 . C2) = (cy c2)? 


In general, let €1,...,Cn be n column vectors of an m x n matrix A. 
Then one can show (for a proof see Exercise 5.16) that the volume of the 
n-dimensional parallelepiped P(A) determined by those n column vectors 


c;’s in R” is 
vol(P(A)) = q/det( AT A). 


In particular, if A is an m x m square matrix, then 


vol(P(A)) = \/det(A? A) = 4/det( AT) det(A) = | det(A)|, 


as expected. 


Problem 4.19 Show that the area of a triangle ABC in the plane R?, where 
A = (z1, y1), B = (z2, yo), C = (z3, y3), is equal to the absolute value of 
1 | Tı yı 1 
— det T2 Y2 1 
2 
| z3 yz 1 


170 


Chapter 4. | Determinants 


4.5 Exercises 


4.1. 


4.2. 


4.3. 


4.4. 


4.5. 


4.6. 


4.7. 


4.8. 


4.9. 


Determine the values of k for which det | ; k | = 


Evaluate det(A? BA!) and det(B~'A?) for the following matrices: 


p 1-23 [3 0 2 
AS -2 31l, B=|3 —2 5 
|o 1 0 | 2 1 3 


Evaluate the determinant of 
3 —2 —5 4 
—5 2 8 —5 
—3 4 7 -3 
2 -3 —5 8 


A= 


Evaluate det A for an n x n matrix A = [a,j] when 


1 ifj ae. ys 
(1) ay =f 0 i=j, (2) aij =i+ j. 
Find all solutions of the equation det( AB) = 0 for 
_ | @+2 3z elisa 0 
A=| 3 ene Be o 


Prove that if A is an n x n skew-symmetric matrix and n is odd, then det A = 
0. Give an example of 4 x 4 skew-symmetric matrix A with det A # 0. 
Use the determinant function to find 


(1) the area of the parallelogram with edges determined by (4, 3) and 
(7, 5), 

(2) the volume of the parallelepiped with edges determined by the vectors 
(1, 0, 4), (0, —2, 2) and (3, 1, —1). 


Use Cramer’s rule to solve each system. 


= a + t2 + & = 2 
Oy ana eae (2) 4 a + 2m + 23 = 2 
1 = i a) + 3r2 — z3 = —4 
| = BS + v4 = —1 
Ly + T3 = 3 
(3) Ly LQ £3 Bir = 2 
| z + t2 + t3 + T4 = 0. 
Use Cramer’s rule to solve the given system: 
1 2 -1 —1 
EEE y)|23 alx=| 2 
0 1 5 0 


4.5. 


4.15. 


4.18. 
4.19. 
4.20. 


Exercises 171 


. Find a constant k so that the system of linear equations 


kz — 2y — Zo = 0 
(kK+1)y + 4z = 0 
(kK-1)z = 0. 


has more than one solution. (Is it possible to apply Cramer’s rule here?) 


. Solve the following system of linear equations by using Cramer’s rule and by 


using Gaussian elimination: 


1111 1 

O S -2 

E ae oa cae) (re 

111 2 4 

. Solve the following system of equations by using Cramer’s rule: 

3z + 2y = 3z + 1 

3z + 22 = 8 — yy 

3z - 1l = z — y. 


. Calculate the cofactors A11, A12, A13 and A33 for the matrix A: 


[12 1 [140 [2 -13 
G)A=l0 1 3], BA=l1 0 2], B)A=]-1 2 2 
l2 11 [312 |3 2 1 


. Let A be the n x n matrix whose entries are all 1. Show that 


(1) det(A — nIn) = 0. 
(2) (A — nIn)ij = (1) 1n”? for all i, j, where (A — nIn)ij denotes the 
cofactor of the (i, j)-entry of A — nn. 


Show that if A is symmetric, then so is adj A. Moreover, if it is invertible, 
then the inverse of A is also symmetric. 


. Use the adjoint formula to compute the inverse of the each of the following 


matrices: 
—2 3 2 cos6 0 —sind 
A= 6 0 3 |, B= 0 1 0 
4 1 -1 sinf 0 cos 6 
. Compute adjA, det A, det(adjA), A~‘, and verify A-adjA = (det A)I for 
| 2 1 3 | 1 2 3 
G)A=| -1 2 0|, Q)A=/]2 34 
| 3 =2- 1 | 1 57 


Show that, if A and B are invertible matrices, then adj( AB) = adjB adjA. 
Find the area of the triangle with vertices at (0,0), (1,3) and (3,1) in R. 


Find the area of the triangle with vertices at (0,0,0), (1,1,2) and (2,2,1) in 
R. 


172 


4.21. 


4.22. 


4.23. 


4.24. 


Chapter 4. | Determinants 


B C 


Let A = | 0D 


| for B € M,(IR) and D € M;,(R) with k +£ = n. Show 


that det A = det B det D. But, in general, even if k = £, det | “ 5 | Fx 


det B det D — det C det E. 


For an m x n matrix A and n x m matrix B, show that 


0 A 


det | -B I 


| = det( AB). 

Let A be an n xn matrix, which is a linear transformation on the n-space R” 
by the matrix multiplication Ax for any x € R”. Suppose that r1, ra, ..., Pn 
are linearly independent vectors in R” constituting a parallelepiped (see Re- 
mark (2) on page 168). Then A transforms this parallelepiped into another 
parallelepiped determined by Arı, Arə2, ..., Arn. Suppose that we denote 
the n x n matrix whose j-th column is r; by B, and the n x n matrix whose 
j-th column is Ar; by C. Prove that 


vol(P(C)) = |det A] vol(P(B)). 


(This means that, for a square matrix A considered as a linear transformation, 
the absolute value of the determinant of A is the ratio between the volumes 
of a parallelepiped P(B) and its image parallelepiped P(C) under the trans- 
formation by A. If det A = 0, then the image P(C) is a parallelepiped in a 
subspace of dimension less than n). 


Determine whether or not the following statements are true in general, and 
justify your answers. 
(1) For any square matrices A and B of the same size, det(A + B) = 
det A + det B. 
(2) For any square matrices A and B of the same size, det( AB) = det(BA). 
(3) If A is an n x n square matrix, then for any scalar c, det(cl, — A) = 
c” — det A. 
(4) If A is an n x n square matrix, then for any scalar c, det(cI, — AT) = 
det(cIn — A). 
(5) If E is an elementary matrix, then det E = +1. 
(6) There is no matrix A of order 3 such that A? = — Tz. 
(7) Let A be a nilpotent matrix, i.e., A* = 0 for some natural number k. 
Then det A = 0. 
) det(kA) = kdet A for any square matrix A. 
) Any system Ax = b has a solution if and only if det A # 0. 
0) For any n x 1, n > 2, column vectors u and v, det(uv”) = 0. 
) If A is a square matrix with det A = 1, then adj(adjA) = A. 
) If the entries of A are all integers and det A = 1 or —1, then the entries 
of AT! are also integers. 
(13) If the entries of A are 0’s or 1’s, then det A = 1, 0, or —1. 


4.5. Exercises 173 


(14) Every system of n linear equations in n unknowns can be solved by 
Cramer’s rule. 
(15) If A is a permutation matrix, then A? = A. 


Chapter 5 


Inner Product Spaces 


5.1 Inner products 


One more important mathematical structure that the 3-space R? has is the 
magnitude of vectors which is induced from the dot (or Euclidean inner) 
product: 


X: y = T1Y1 + T2Y2 + T3Y3 = x’ y, 


for two vectors x = (x1, £2, £3) and y = (y1, yo, y3) in R. The length 
(or magnitude) of a vector x = (#1, £2, £3) is then defined as 


and the Euclidean distance of two vectors x and y in R? is defined by 


d(x, y) = ||x — yll. 


In this way, the dot product plays the role of a ruler for measuring the 
length of a line segment in IR*. Furthermore, it can also be used to measure 
the angle between two vectors: in fact, the angle 0 between two vectors x 
and y in R? is measured by the formula involving the dot product 


cos 0 = 


,O<0<n, 
IIxll{ly 


since the dot product satisfies the formula 
x- y = ||xIl|ly]] cos 8. 


175 


176 Chapter 5. Inner Product Spaces 


In particular, two vectors x and y are orthogonal (i.e., they form a right 
angle 0 = 1/2) if and only if the Pythagorean theorem holds: 


lll? + Iyl? = |x + yl, 
which is equivalent to: 
X: y =21y1 + Layo + F3y3 = 0. 


Note that the Euclidean geometry begins with the vector space R to- 
gether with this dot product, by which the Euclidean distance is defined. 

The dot product has a direct extension to the n-space R” for any positive 
integer n, i.e., for vectors x = (#1, ..., Zn) and y = (y1, ..., Yn) in R”, the 
dot product, also called the Euclidean inner product, and the length 
(or magnitude) of a vector are defined similarly as 


key = LIY H+ EnYn = x’ y, 
1 
Ix] = (x-x)? = T? + +22, 


In fact, the entries of the product AB of two matrices A and B are simply 
the dot products of the rows of A and the columns of B. In particular, those 
of AT A are the dot products of columns of A. 

In order to extend this notion of dot product to vector spaces in general, 
we extract the most essential properties that the dot product in R” satisfies 
and take these properties as axioms for an inner product of a vector space 
V. First of all, we note that it is a rule that assigns a real number x - y to 
each pair of vectors x and y in R”, and the essential rules it satisfies are 
those in the following definition. 


Definition 5.1 An inner product on a real vector space V is a function 
that associates a real number (x, y) to each pair of vectors x and y in V in 
such a way that the following rules are satisfied for all vectors x, y and z 
in V and all scalars k in R: 


(1) (x,y) = (y, x) (Symmetry) 
(2) (x +y,z) = (X, z) + (y, z) (Additivity) 
(3) (kx, y) = k(x, y) (Homogeneity) 
(4) (x,x) > 0, and (x,x)=0 & x=0 (Positive definiteness) 


A pair (V, (,)) ofa (real) vector space V and an inner product ( ,) is called 
a (real) inner product space. In particular, the pair (R”,-) is called the 
Euclidean n-space. 


5.1. Inner products 177 


Note that by symmetry (1), additivity (2) and homogeneity (3) also hold 
for the second variable: t.e., 
(2') (x,y +z) = (x,y) + (x,2), 
(3') (x, ky) = k(x, y). 
It is easy to show that (0, y) = 0(0, y) = 0, and (x,0) = 0. 


Example 5.1 (Non-Euclidean inner product on R?) For vectors x = (21, £2) 
and y = (y1, y2) in R’, define 


axyy1 + c(a1y2 + £oy1) + broyo 


= alt Jeje 


(x, y) 


Y2 


where a,b and c are real numbers. Then this function ( , ) clearly satisfies 
the first three rules of the inner product. Moreover, one can easily verify 
that it also satisfies the rule (4) if and only if a > 0 and ab — c? > 0 hold 
(see Problem 5.1). When c = 0, it reduces to (x, y} = axz1yı + brgy2. Notice 
also that a = (e1,€1), b= (e2, €2) and c= (e1, 2) = (e2, 1). 


a c 


Problem 5.1 Prove that A = aih 


defines an inner product in R? via (x, y) = 


[v1 z2] È | Le | = x" Ay if and only if a > 0 and ab—c? > 0 hold. 
2 


Example 5.2 Let V = C [0, 1] be the vector space of all real-valued con- 
tinuous functions on [0, 1]. For any two functions f and g in V, define 


1 
(fg) = f soua. 


Then ( , ) is an inner product on V (verify this). Let 


1-22 if0<2< 4, 0 if0<a<s, 
f(z) = tt and g(x) = ael 
0 ifs ost, 5a. 


Then f # 0 # g, but (f,g) =0. 


d 0 0 | 
Example 5.3 For any diagonal matrix A= | 0 d2 0 j withalld; > 0, 
0 0 ds | 


(x,y) =x! Ay = dyxiy, + d2£2Y2 + d3£3Y3 


defines an inner product on R. Thus, there are infinitely many inner prod- 
ucts on R°. 


178 Chapter 5. Inner Product Spaces 


By a subspace W of an inner product space V, we mean a subspace of 
the vector space V together with the inner product that is the restriction of 
the inner product of V to W. 


Example 5.4 [Two different inner product] The set W = D!(0, 1] of all 
real-valued differentiable functions on [0, 1] is a subspace of V = C[0, 1]. 
The restriction to W of the inner product on V defined in Example 5.2 makes 
W an inner product subspace of V. However, suppose we define another 
inner product on W by the following formula: For any two functions f(z) 
and g(x) in W, 


1 1 
(ta) = f Fleg(ejae + f f'(x)g' (a)dx. 


Then ( , } is also an inner product on W, which is different from the 
restriction to W of the inner product of V, and hence W with this new 
inner product is not a subspace of the inner product space V. 


5.2 Matrix representations of inner products 


Let A be an n x n diagonal matrix with positive diagonal entries. Then one 
can show that (x,y) = x! Ay defines an inner product on the n-space R” as 
in Example 5.3. On the other hand, every inner product on a vector space 
can be expressed in such a matrix product form: Let (V, (,)) be an inner 
product space, and let a = {v1, ..., Vn} be a fixed ordered basis for V. 
Then for any x = y; Tiv; and y = Y_, Yvy in V, 


(x,y) = y S ayy (vi, vs) 


i=l j=l 


holds. If we set aj; = (vi,vj) for i,j = 1, ..., n, then these numbers 
constitute a symmetric matrix A = [a;;], since (v;, vj) = (vj, vi). Thus, in 
matrix notation, the inner product may be written as 


(x,y) = So aiyjaiy = [xD Aly. 
i=1 j=l 


The matrix A is called the matrix representation of the inner product 
with respect to the basis a. 


5.2. Matrix representations of inner products 179 


Example 5.5 (1) With respect to the standard basis {e1, e2, ..., en} 
for the Euclidean n-space R”, the matrix representation of the dot product 
is just the identity matrix, since e; : ej = 6;;. Thus for x = } ziei, y = 
do yje; E€ R” the dot product is just the matrix product x! y: 


1 0 yı yı 
0 1 Yn Yn 
(2) Let V = P>([0,1]), and define an inner product on V as 


1 
(f.g) = [ fade: 


Then for a basis a = {fi (x) = 1, fo(z) = x, f3(z) = x7} for V, one can 
easily find A = |[a;;|: for instance, 


1 


1 1 
a23 = (f2, f3) = f falx) f3(a)dx = I z- x?°dz = T 


The expression of the dot product as a matrix product is very useful in 
stating and proving theorems in the Euclidean space. 

On the other hand, for any symmetric matrix A and for a fixed basis a, 
the formula (x, y) = [x]/ Aly]q seems to give rise to an inner product on V. 
In fact, the formula clearly satisfies the first three rules in the definition of 
the inner product, but not necessarily the fourth rule, positive definiteness. 
The following theorem gives a necessary condition for a symmetric matrix 
A to give rise to an inner product. Some necessary and sufficient conditions 
will be discussed in Chapter 8. 


Theorem 5.1 The matriz representation A of an inner product (with re- 
spect to any basis) on a vector space V is invertible. 


Proof: Let a = {v1, ..., Vn} bea basis for an inner product space V. The 
linear dependence of the column vectors of A can be written as A[x],_ = 0, 
for some x € V. Then 


(xox) = [x]? Alx]a = 0. 


Thus, [x]a = 0, and A is invertible by Theorem 1.9. 


180 Chapter 5. Inner Product Spaces 


5.3 The lengths and angles of vectors 


In an inner product space V, the length of vector and the angle between two 
vectors may be defined by an inner product just like the Euclidean n-space. 


Theorem 5.2 (Cauchy-Schwarz inequality) If x and y are vectors in 
an inner product space V, then 
(x,y)? < (x, x)(y,y). 
Proof: Ifx = 0, it is clear. Assume x 4 0. For any scalar t, we have 
0 < (tx+y,tx+y) = (x,x)t? + 2(x,y)t+ (y,y). 
This inequality implies that the polynomial (x, x)t? + 2(x,y)t + (y, y) in t 


has either no real roots or a repeated real root. Therefore, its discriminant 
must be nonpositive: 


(x, yy ~~ (x, x) (y, y) < 0, 
which implies the inequality. 


Problem 5.2 Prove that equality in the Cauchy-Schwarz inequality holds if and 
only if the vectors x and y are linearly dependent. 


Therefore, from the Cauchy-Schwarz inequality, we have 


aaar ea, 
Illy! 


and so there is a unique number 6 € [0, 7] such that cos 0 = ah. 


Definition 5.2 Let V be an inner product space. 
(1) The magnitude ||x|| (or the length) of a vector x is defined by 


IIx|| = V(x, x). 


(2) The distance d(x,y) between two vectors x and y is defined by 
d(x,y) = ||x = yll: 


(3) The angle between x and y is the real number @ in the interval [0, 7] 
that satisfies 
(x, y) 


cos 0 = i 
IIxIIIly ll 


or (x,y) = |Ixl|llyl| cos 8. 


5.3. The lengths and angles of vectors 181 


Example 5.6 In R’, let an inner product be given as (x, y) = 2%, y1+322yo. 
The angle between x = (1,2) and y = (1,0) is computed as 


2 
cosg = 5I) 


IIxilllyl V14- 2" 


= -1/1 

Thus 0 = cos (Sz). 
Problem 5.3 Prove the following properties of length in an inner product space 
V: For any vectors x, y E€ V, 

(1) Ixl] > 9, 

(2) ||x||=0 if and only if x = 0, 

(3) [|x|] = |k] lix], 

(4) |x + yl] < lxil + Iyl (Triangle inequality). 


Problem 5.4 Let V be an inner product space. Show that for any vectors x, y 
and z in V, 


(1) d(x,y) > 0, 

(2) d(x,y) =0 if and only if x=y, 

(3) d(x,y) = d(y,x), 

aay) = 6062) Paley) (Triangle inequality). 


Definition 5.3 Two vectors x and y in an inner product space are said to 
be orthogonal (or perpendicular) if (x,y) = 0. 


Note that x = 0 is orthogonal to every vector, and for any nonzero 
vectors x and y, (x,y) = 0 if and only if 0 = 7/2 by Definition 5.3. 


Lemma 5.3 Let V be an inner product space and let x € V. Then the vector 
x is orthogonal to every vector y in V (i.e., (x,y) =0 for ally in V) if and 
only if x = 0. 


Proof: Suppose that (x,y) = 0 for all y in V. Then (x, x) = 0 in particular. 
The positive definiteness of the inner product implies that x = 0. 


Corollary 5.4 Let V be an inner product space, and let a= {v1, ..., Vn} 
be a basis for V. Then a vector x in V is orthogonal to every basis vector 
vi ina if and only if x =0. 


Proof: If (x,v;) =0 for i =1, ..., n, then (x,y) = J; yi(x, vi) = 0 for 
any y= oy yivi E V. 


182 Chapter 5. Inner Product Spaces 


Example 5.7 (Pythagorean theorem) Let V be an inner product space, 
and let x and y be any two vectors in V with the angle 0. Then, (x,y) = 
||x|||ly || cos 6 gives the equality 


lx + yl? = Ixl? + Ilyll? + 2[[xI|llyl] cos 8. 


Moreover, it deduces the Pythagorean theorem: ||x + y||? = ||x||? +|ly||? for 
any orthogonal vectors x and y. 


Theorem 5.5 If x), X2, ..., X, are nonzero mutually orthogonal vectors 
in an inner product space V (i.e., each vector is orthogonal to every other 
vector), then they are linearly independent. 


Proof: Suppose c1X1 +coxXg+---+c,x, = 0. Then for each i = 1, ..., k, 
0 = (0,x;) = (crx, +--+ + cKxXK, Xi) 
= ¢1(x1,xj) +--+ + oj (xj, Xi) +--+ + eK ( xe, Xi) 
= cillxall?, 
because X1, ..., Xx are mutually orthogonal. Since each x; is not the zero 
vector, ||x;|| Æ 0; soc; =0 fori = 1, ..., k. 


Problem 5.5 Let f(x) and g(x) be continuous real-valued functions on [0, 1]. 
Prove 


D [EAOa] < [fp Paaa] [fp Pear], 
(2) [fo (F(a) + 9(a)Pde]” < [fp Pede)" + [fp e)a] 


NR 


5.4 Orthonormal bases 


The standard basis for the Euclidean n-space R” has a special property: The 
basis vectors are mutually orthogonal and are of length 1. In this sense, it 
is called the rectangular coordinate system for R”. In an inner product 
space, a vector with length 1 is called a unit vector. If x is a nonzero 


1 
vector in an inner product space V, the vector Te is a unit vector. The 
x 


process of obtaining a unit vector from a nonzero vector by multiplying the 
reciprocal of its length is called a normalization. Thus, if there is a set 
of mutually orthogonal vectors (or a basis) in an inner product space, then 
the vectors can be converted to unit vectors by normalizing them without 
losing their mutual orthogonality. 


5.4. Orthonormal bases 183 
Problem 5.6 Normalize each of the following vectors in the Euclidean space R?: 
(1) u=(2, 1, -1), (2) v = (1/2, 1/3, —1/4). 


Definition 5.4 A set of vectors {x , X2, ..., x,} in an inner product space 
V is said to be orthonormal if 


ys _ JO ift AJ (orthogonality), 
(Xi, Xj) = Oj =. { 1 if 4 =j (normality). 


A set {X1, X2, ..., Xn} of vectors is called an orthonormal basis for V if 
it is a basis and orthonormal. 


Problem 5.7 Determine whether each of the following sets of vectors in R? is 
orthogonal, orthonormal, or neither with respect to the Euclidean inner product. 


o ifopll etlblsl 
ofi blaj @ thik) el] 
Notice that, if x = (%1,...,%n) = >) qie; in R’, then z; = x-e;. This 
also holds in any inner product space with an orthonormal basis. 


Theorem 5.6 Suppse {uj, U2, ..., ux} is an orthonormal basis for a 
subspace U in an inner product space V. Then, for any vector x in U, 


X = (uj, x)uUy + (U2, x)U2 +--+ + (Uk, x) UR. 


Proof: For any vector x € U, we can write x = %,u, + T2U2 +---+ UpUR, 


as a linear combination of the basis vectors. However, for each i = 1,...,n, 
(ui, X) = (Uj, £101 +--+ + TkUk) 
= gı(u;, U1) +++: + 2;(Uy, Uj) +--+ + 2% (U;, UK) 
= Ti, 
because {u;, U2, ..., ux} is orthonormal. 
Corollary 5.7 If a = {vi, vo, ---; Vn} is an orthonormal basis for V, 


then any vector x in V can be written uniquely as 
X = (v1, X) v1 + (v2, X)Vo +- + (Wn, X)Vn- 


Theorem 5.8 below shows that, in an inner product space, one can always 
find an orthonormal basis. For this, we first look at the following example 
which illustrates how to construct an orthonormal basis for an inner product 
space V. 


184 Chapter 5. Inner Product Spaces 


Example 5.8 For a matrix 


1 
1 
A 
1 


=. O Ne 
O ANN 


find an orthonormal basis for the column space C(A) of A. 


Solution: Let cy, C2 and cg be the column vectors of A in the order from 
left to right. It is easily verified that they are linearly independent, so they 
form a basis for the column space C(A) of dimension 3 in R4. 

(1) We first normalize c; as: 


C1 C1 1 1 1 1 
Vi non a Dl Dl eek D : 
lc| 2 2° 202° 2 
Then vı is a unit vector and cj = 2v; means C 
[vi C2 c3]. 

(2) Clearly, C2 — (v1, €2)}v1ı = C2 — 2v,; = (0, 1, —1, 0) is a nonzero 
vector orthogonal to vı. Thus, 


Co — 2v1 1 1 1 ) 
E EE E E Oe T 
? = esm] V E J 


is a unit vector, and {v1, v2} is orthonormal. Since cg = 2v; + V2və, 
C(A) = C(B2) where B2 = [v1 vo cs]. 

(3) Similarly, c3—(v1, C3) Vi t(vo, C3)V2 = C3 —4v] + V2və = (0, tt, —2) 
is also a nonzero vector orthogonal to both vı and vo. In fact, one can easily 
check that 


(vi, c3 — 4vı + V2 və) = (vi, c3) — 4(v1; v1) + V2 (vı, v2) 
(vo, c3 — 4v, + V2v2) = (v2, c3) — 4(v2, v1) + V2 (v2, v2) 


= 
I 


C(B,) where By = 


= 0 ; 
= 0 i 
By the normalization, the vector 


c3 — (v1, €3)}v1 — (V2, €3)}v2 _ 1- 


v3 = er 
le3 — (v1, €3)v1 — (v2, c3}vəll V6 


(0, 1, 1, —2) 


is a unit vector, and so {v1, V2, v3} is orthonormal. Since c3 = 4v1 — V2v2+ 
V6v3, we have C(A) = C(B3) where B3 = [vı v2 v3], and so {v1, v2, v3} is 
an orthonormal basis for C(A) by Theorem 5.5. 


The orthonormalization process in Example 5.8 indicates how to prove 
the following general case, called the Gram-Schmidt orthogonalization. 


5.4. Orthonormal bases 185 


Theorem 5.8 Every inner product space has an orthonormal basis. 
Proof: [Gram-Schmidt orthogonalization process I] Let {x1, x2, ..., Xn} 
be any basis for an n-dimensional inner product space V. Let 


Xi X2 — (v1, X2}V1 


vi = —— v2 = ———— ~. 
call’ [x2 — (v1, x2)vil| 


Of course, X2 — (X2, V1)vı # 0, because {x;, x2} is linearly independent. 


Inductively, we define, for k = 3, ..., n, 
XR — (V1, Xk} V1 — (V2, Xk} V2 — +++ — (Vk—1, Xk) Vk—1 
Vk = MMMM 
xk — (V1, Xk} V1 — (V2, Xk) V2 — +++ — (Vk-1; Xk) Vk—1 || 
Then, as Example 5.8 shows, all the necessary conditions for {v1,..., Vn} 


to be an orthonormal basis for V are easily verified. 


Therefore, in any inner product vector space, one can always find an 
orthonormal basis. 


Problem 5.8 Use the Gram-Schmidt orthogonalization on the Euclidean space Rt 
to transform the basis 


{(0, l; 1, 0), (-1, 1, 0, 0), (1, 2, 0, —1), (-1, 0, 0, —1)} 
into an orthonormal basis. 
Problem 5.9 Find an orthonormal basis for the subspace of the Euclidean space 


R? given by z + 2y — 3z = 0, which is the orthogonal complement of the vector 
(1,2, —3) in R. 


Problem 5.10 Let V = C[0, 1] with the inner product 


a= f ro x)dx for any f and gin V. 


Find an orthonormal basis for the subspace spanned by 1, z and z?. 


Note that the matrix representation A = [a;j] of the inner product 
with respect to an orthonormal basis a is the identity matrix, since aij = 
(vi, vj) = dij. This means that the natural isomorphism ® : V > R” given 
by (v;i) = [vila = €; i = 1, ..., n (see the last remark of Section 3.4) 
preserves the inner product of vectors: For two vectors x = eee zivi and 


y = ei wv With z; = (v;i, x) and yi = (vi, y), 


n n n n 
= > ave Sous =X sui = > = Ele 
i=l j=l i=l 


i, j=1 


186 Chapter 5. Inner Product Spaces 


where [x]q and [y]a are the coordinate vectors of x and y with respect to 
a. The right side of this equation is just the dot product of vectors in the 
Euclidean space R”. That is, 


(x,y) = [kla : [Yla = (x) - ®(y) 


for any x, y € V. Hence, it identifies the inner product on V with the dot 
product on R”. In this sense, we may restrict our study of an inner product 
space to the case of the Euclidean n-space R” with the dot product. 

A linear transformation that preserves the inner product such as the 
natural isomorphism from V to R” plays an important role in linear algebra, 
and we will discuss this kind of transformation in Section 5.9. 


5.5 Orthogonal projections 


In Section 3.3, we have studied projections in general. Here, we discuss some 
particular projections, called orthogonal projections: For a subspace U in 
an inner product space V, there is a particular choice of a complementary 
subspace W, called the orthogonal complement of U, along which the pro- 
jection onto U is called the orthogonal projection. For this, we first extend 
the orthogonality of two vectors to that of two subspaces of V. 


Definition 5.5 Let U and W be subspaces of an inner product space V. 


(1) U and W are said to be orthogonal, written by U L W, if (u, w) =0 
for every u E€ U and w E€ W. 


(2) The set of all vectors in V that are orthogonal to every vector in U is 
called the orthogonal complement of U, denoted by U+, i.e., 


UŁ = {v € V : (v,u) =0 for all u € U}. 


One can easily show that UŁ is a subspace of V and U L UŁ, and 
v € UŁ if and only if (v, u) = 0 for every u € 8, where £ is a basis for U. 
Clearly W L U if and only if W C U+, or U C W+. 


Problem 5.11 Show that: (1) If U L W, then U NW = {0}. (2) U C W if and 
only if W+ C UŁ. 


Theorem 5.9 Let U be a subspace of an inner product space V. Then 
V =U@UŁ, and (Ut)t =U. 


5.5. Orthogonal projections 187 


Proof: Let 6 = {v1, V2,..., Vk} be an orthonormal basis for U. Using the 
Gram-Schmidt orthogonalization again, one can extend a to an orthonormal 
basis for V: say @ = {V1, V2,..., Vk; Vk+1;---; Vn}. Then one can easily 
show that y = {vz41,---.Vn} forms an orthonormal basis for Ut. Thus it 
is clear that V = U p U+ and (Ut)+ =U. 


Problem 5.12 Let U and W be subspaces of an inner product space V. Show 
that 


(1) (U+W)+ =utnw!?. (2) (UAW) =u+4+Wwe. 


Problem 5.13 Let U C Rt be the subspace of the Euclidean 4-space Rt spanned 
by (1, 1, 0, 0) and (1, 0, 1, 0), and W C Rt be the subspace spanned by (0, 1, 0, 1) 
and (0, 0, 1, 1). Find a basis for and the dimension of each of the following 
subspaces: 

()U+W, (jut, @)ut+wt, (UnW. 


Definition 5.6 For a subspace U of an inner product space V, the projec- 
tion P: V =U @U+ 5 V from V onto U = Im(P) along U+ = Ker(P) 
is called the orthogonal projection of V onto U, denoted by P = Projy. 
For x € V, x =u+w € U @U? is called the orthogonal decomposition 
of x, and Projy(x) = u € U is called the orthogonal projection of x into 
U. 


Hence, a projection P : V — V is orthogonal if and only if Im(P) L 
Ker(P). 


Example 5.9 Let U, W and V be subspaces of R? as in Example 3.10. 
Then clearly W = UŁ and V # UŁ. Hence, Ty = Projy is the orthogonal 
projection, but Sy is not: Sy Æ Projy. 


Theorem 5.10 Let U be a subspace of an inner product space V. Then, 
for any x € V, the orthogonal projection Projy(x) of x into U satisfies 


[x — Projy(x)|| < Ilx — yll 
for ally EU. The equality holds if and only if y = Projy(x). 


Proof: Since (x — Projy(x)) L (Projy(x) — y) for any y E U, 


JI? + [[Projy (x) — yll? 
JIP. 


Ix—yll2 = lix- Projy(* 
> jx — Projy(x 


The equality holds if and only if ||Projy (x) — y||? = 0, or y = Projy(x). 


188 Chapter 5. Inner Product Spaces 


Theorem 5.10 means that the orthogonal projection Projy(x) of x is the 
only vector in U that is closest to x: that is, it minimizes the distances from 
x to the vectors in U and it is unique. 


Problem 5.14 Find the point on the plane z — y — z = 0 that is closest to p = 
(1, 2, 0). 


Note that V = U@®U+ = Im(Projy)@Ker(Proj,;) = U+@(U+)+ by The- 
orem 3.9. Thus, for any vector x € V, x = Projy(x)+w where Projy(x) € U 
and w = Projy1 (x) = x — Projy(x) € UŁ. Thus Projys = Idy — Projy is 
the projection of V onto U+ = Ker(Proj,;) along U (see Corollary 3.10). Fig- 
ure 5.1 depicts geometrical picture of the vectors Projy(x) and Projy.(x): 


U- 


Figure 5.1: Orthogonal projection 


If we have an orthonormal basis 6 = {u;,..., u} for U, which always 
exists by Theorem 5.8, then the formula of the orthogonal projection may 
be obtained more precisely: For any x € V, we have x = u +w € U oUt 
and 

u = (u, uju; + (Ug, U)u2 + -+ + (ug, U)Uk 


by Theorem 5.6. However, since (uj,w) = 0, for i = 1,...,k, (uj,u) = 
(u;,u + w) = (u;,x). Thus, for any x E V, 


Projy(x) = u = (uj, x)uy + (u2, x)U2 +--+ + (Uk, x)UR. 


Theorem 5.11 Let 6 = {uj,...,ug} be an orthonormal basis for a sub- 
space U of an inner product space V. Then the orthogonal projection Projy : 
V >V of V onto U is written as 


Projy(x) = (u1, x)u, + (u2, x)U2 +--+ + (Uk, x) UR, 


for x €V, so that Im(Projy) = U and Ker(Projy) = Ut. 


5.5. Orthogonal projections 189 


Problem 5.15 Let U C Rt be the subspace of the Euclidean 4-space Rt spanned 
by (1, 1, 0, 0) and (1, 0, 1, 0), and W C R* be the subspace spanned by (0, 1, 0, 1) 
and (0, 0, 1, 1). Find a basis for and the dimension of each of the following 
subspaces: 

()U+W, (2)U+, (@)Ut+wt, (4)UNnW. 


In particular, if U = {ru : r € R} is a 1-dimensional subspace deter- 
mined by a unit vector u in an inner product space V, then the orthogonal 
projection of a vector x in V into U is simply 


Projy (xx) = (u, x)u, 
where (u, x) = ||x||cos@. Hence y = x — (u,x)u € U+ and 


x=(u,x)u+yeU Q UŁ. 


From this, the Pythagorean theorem ||x||? = ||y||? + |(u,x)|? follows. 


In particular, if V = R” the Euclidean space with the dot product, then 
T. T: ) 


Projy(x) = (u-x)u = (u’x)u = u(u’x) = (uu’ )x. 


(Here, the last two equalities come from the facts that u-x = uf x is a scalar 
and the matrix product uu’ x is well-defined.) This equation shows that the 
matrix representation of the orthogonal projection Projz; with respect to the 
standard basis a for R” is 


[Projy]q = uu’. 


Example 5.10 Consider the subspace U = {ru : r € R} in R? spanned 
by a unit vector u = (Fs ae aa): 


Figure 5.2: Projection onto a line 


190 Chapter 5. Inner Product Spaces 


Then the projection Projy onto U is 


ate 
j v3 {f1 1 1 ee 
V3 
Thus, if x = (1,2,3) € R, then 
TEES 1 2 
Projy(x) = 3 1 1 1 2|ļ|=]|2 
1 1 1 3 2 


Example 5.11 Let £ be the line in the plane R? defined by ax +by +c = 0. 
Find the distance from a point P = (x9, yo) to the line £. 


Solution: Note that the vector n = (a,b) is perpendicular to the line £, 
since for any two distinct points Q = (x1,y1) and R = (x2, y2) on the line, 
QR-n = a(z — z1) +b(y2— y1) = 0. Then the distance d between the point 
P and the line £ is simply the length of the PA projection of Q 


onto the line U spanned by the unit vector u = i = Jat However, 


1 a 
. s; Ta 
a ae = aap ole è] | eel a ? |: 
Since OP = (zo — £1, Yo — Y1), 


Projy( (QP) = 


a? J pe | = u] 
Yo — Yı b j 


ralo b2 a2 + b2 


Figure 5.3: Distance from a point to a line 


5.5. Orthogonal projections 191 


Thus, 


|axzo + byo + c| 


d= |[Projy(QP)|| = “TS 


In general, if U is an k-dimensional subspace of R” with an orthonormal 


basis {u,, Ug, ..., ug}, then for any x € R” 
Projy(x) = (ui -x)u) + (uy -x)ug +--+ + (ux - x) uy 
= u (u? x) + uy (us x) +e u; (uz x) 


T T 
(ujuj + u2u3 


uj U; )x. 


Corollary 5.12 Let U be a subspace of R” with an orthonormal basis {u,, 
.., Ug}. Then the orthogonal projection Projy is 


Projy = Proja, + Proja, +++ + Proja, 


= wu] + ugu? + + uu, = [Projy]a, 


which is also the matrix representation with respect to the standard basis a 
; r Proj,, ifi=j 
for R". Thus Proja, © Proja, = { 0 E FIA 


Definition 5.7 The matrix representation [Projy]a of the orthogonal pro- 
jection Proj, : R” — R” of R” onto a subspace U with respect to the 
standard basis a is called the (orthogonal) projection matrix on U. 


Theorem 5.13 A square matriz P is an orthogonal projection matriz if 
and only if P is symmetric and idempotent: i.e., P? = P and P? = P. 


Proof: If P is an orthogonal projection onto a subspace U of R”, then 


P= uu + uu, +---4 uut 


for some orthonormal basis {u,, ..., ug} for U by Corollary 5.12. Thus 
PT = P. P? = P follows by Theorem 3.9, or by a direct computation. 
Conversely, P? = P means P is a projection again by Theorem 3.9. Thus 
R” = Im(P) @ Ker(P) = C(P) @N(P), and P(u) = u for u € Im(P). Since 
N(P) = N(P*) and C(P) L N(P*), we have Im(P) L Ker(P), i.e., P is 
orthogonal. 


192 Chapter 5. Inner Product Spaces 


Example 5.12 If A = [c; c2], where cı = (1, 0, 0), c2 = (0, 1, 0), then 
the column vectors of A are orthonormal, C(A) is the xy-plane, and the 
projection of b = (x, y, z) € R? onto C(A) is be = (x, y, 0). In fact, 


1 0 1 
P=AA"=|0 1 Pre 1 i 
0 0 0 
which is equal to ccf + coe. 
Example 5.13 Let P; : R” — R” be defined by 
si Crees Im) = (0, ..., 0, Ti, 0, ..., 0), 
fori = 1, ..., m. Then each P; is the orthogonal projection of R” onto the 
i-th axis, whose matrix form looks like 
0 0 
0 1 
R= 1 ; I — P; = 0 
0 1 


When we restrict the image to R, P; is an element in the dual space R” *, and 
usually denoted by z; as the i-th coordinate function (see Example 3.25). 


Problem 5.16 For any square matrix P, show that P satisfies PTP = P if and 
only if P is an orthogonal projection matrix. 


Problem 5.17 Show that if {v1, vo, ..., Vm} is an orthonormal basis for R”, 
then viv + vov +e Vmi, = Ip. 


5.6 Relations of fundamental subspaces 


With the concept of the orthogonal projection of vectors onto a subspace, 
one can completely determine the solution set of a system Ax = b. 


Lemma 5.14 Foran mxn matrix A, the null space N(A) and the row space 
R(A) are orthogonal: i.e., N(A) L R(A), in R". Similarly, N(AT) L C(A), 
in R”. 


5.6. Relations of fundamental subspaces 193 


Proof: Note that w € N(A) if and only if Aw = 0, i.e., for each row 
vector rin A,r-w=0. The second statement follows similarly. 


From Lemma 5.14, it is clear that 


N(A) © R(A)} (or R(A) 
N(A") © C(A)* (or C(A) 


N(A)*), and 


c 
E NATEN 


Moreover, by comparing the dimensions of these subspaces and by using 
Theorem 5.9 and Rank Theorem 2.17, we have 


dim R(A) + dim N (A) dim R(A) + dim R(A)+ 
dimC(A) + dimN(A?) = m = dimC(A) + dimC(A)*. 


n 


This means that the inclusions are actually equalities. 
Lemma 5.15 (1) N(A) = R(A)+ (or R(A) = N(A)*). 
(2) N(A™) =C(A)* (or C(A) = N(A)*). 


That is, the row space R(A) and the null space N(A) are orthogonal 
complements of each other in R”, and so are the column space C(A) and 
the null space N(A‘) of AT in R”. Hence, by Theorem 5.9, we have the 
following theorem. 


Theorem 5.16 For any mxn matriz A, 


(1) N(A) B R(A) =R", 
(2) N(AT) @C(A) = R”. 


R(A) = R 


Figure 5.4: Relations of fundamental subspaces 


194 Chapter 5. Inner Product Spaces 


For an m x n matrix A of rank r, which is considered as a linear trans- 
formation A: R” — R”, Theorem 5.16 is depicted in Figure 5.4 . Note that 
if rank A = r so that dimR(A) = r = dimC(A), then dim N (A) = n-r 
and dim N (A?) = m-—r. 


Corollary 5.17 The set of solutions to a consistent system Ax = b is 
precisely xo +N (A) where xo is any solution of Ax = b. 


Proof: Let xp € R” be a solution of Ax = b. If y € xy + V(A) € R”, then 
y = Xo +n, for some n € V(A). But A(xp + n) = Axo = b means y is also 
a solution. Conversely, if x is another solution, then A(x — x9) = 0 means 
x — Xo is in the null space V(A). By setting n =x — Xo, X = Xo + n. i.e., 
X € Xo + N (A). 


In particular, if rank A = m < n, then N(AT) = {0} and C(A) = R”. 
Thus for any b € R”, the system Ax = b has at least one solution in R” 
(this is the existence Theorem 2.24). 

On the other hand, if rank A = n < m, then N(A) = {0} and R(A) = 
R”. Therefore, the system Ax = b has at most one solution, that is, it has 
a unique solution x in R(A) if b € C(A), and has no solution if b ¢ C(A) 
(this is the uniqueness Theorem 2.25). The latter case may occur when 
m >r = rank A: that is, M(AT) is a nontrivial subspace of R™, and will be 
discussed in the next section. 


Problem 5.18 Given two vectors (1, 2, 1, 2) and (0, —1, —1, 1) in Rt, find all 
vectors in Rf that are perpendicular to them. 


Problem 5.19 Prove the followings. 
(1) If Ax = b and Ay = 0, then yb =0. 
(2) If Ax = 0 and Ay = c, then xTc = 0. 


Problem 5.20 Find a basis for the orthogonal complement of the row space of A: 


1 2 8 001 
ma-i -1 i. @a=foo i) 
3 0 6 1 11 


5.7. Least square solutions 195 


5.7 Least square solutions 


In the previous section, we have determined the solution set completely for 
a system Ax = b when b € C(A). In this section, we discuss what we can 
do when the system Ax = b is inconsistent, that is, when b ¢ C(A) C R”. 
Of course, one can not find a solution in this case, but one can find ‘pseudo’- 
solutions in the following sense. 

Note that Ax € C(A) for any vector x in R". Thus, if Ax = b is 
inconsistent, the best we can do is to find a vector x9 € R” such that 
AXo is closest to b € R”. However, any b € R” has a unique orthogonal 
decomposition b = be +bn € C(A) NM (AT) = R”, in which bs is the vector 
in C(A) closest to b and Ax = b, is always consistent. A solution xọ € R” 
of Ax = be will give us the best approximation Axo = b, to b. Anyone of 
such solution xg is called a least square solution of Ax = b. 

In summary, to find a least square solution, we first need to find the 
unique orthogonal decomposition of b as 


b = be + bn E€ C(A) pN (AT) = R”. 


Then be is the orthogonal projection be = Projg(4)(b) € C(A) of b onto 
C(A), and it has two basic properties: 


(1) It is the closest vector to b among the vectors in C(A) (see Theo- 
rem 5.10). 


(2) There always exists a solution xg € R” of Ax = be, since be € C(A). 


Therefore, a least square solution x9 € R” of Ax = b is just a solution 
of Ax = be. In particular, if b € C(A), then b = be so that the least 
square solutions are just the ‘true’ solutions of Ax = b. The first property 
of be means that a least square solution x9 € R” of Ax = b gives the best 
approximation Axg = be to b: i.e., for any vector x in R”, 


|| Axo — bl] < ||Ax — bli. 


Furthermore, if x) € R” is a least square solution, then all the other 
least square solutions are the vectors in x9 +N (A) by Corollary 5.17. 

The computation of be from b € R” may be easy if we are given an 
orthonormal basis for C(A) due to Corollary 5.12. In general, one may try 
to find an orthonormal basis for C(A) by Theorem 5.8 (Gram-Schmidt or- 
thogonalization), and then use Corollary 5.12. However, the Gram-Schmidt 
orthogonalization, which is the only way we know so far for finding an or- 
thonormal basis for an inner product space, takes a lot of computations. 


196 Chapter 5. Inner Product Spaces 


However, there is a direct way of finding a least square solution as we will 
see below, and then one may use this solution to find the orthogonal pro- 
jection. 

Let xo € R” be a least square solution of Ax = b. Then 


Axo — b = Axo — (be + bn) = —bn € N(A*) 


holds since Axo = be. Thus, A’ (Axo — b) = A’ (—b,,) = 0 or equivalently 
AT Axo = A’b, that is, xp is a solution of the equation 


A! Ax = A'b. 


This equation is called the normal equation of Ax = b. Note that this 
normal equation is always solvable since A! kills b, of b which lies outside 
of C(A). The next theorem says that a solution of normal equation gives 
rise a least square solution. 


Theorem 5.18 Let A be an m x n matriz, and let b E€ R” be any vector. 
Then a vector Xo E€ R” is a least square solution of Ax = b if and only if 
xy is a solution of the normal equation A! Ax = A’b. 


Proof: We only need to show the sufficiency of the normal equation: If 
xq is a solution of the equation A’ Ax = ATb, then A’ (Axo — b) = 0, so 
Axo —b € N(AT). If we set Axo —b = c € N (AT) and write b = be + by € 
C(A) BN (AT), then we get Axo — be = c + bn € C(A) AN (AT) = {0}. 
Therefore Axo = be = Projc(a)(b), i.e., Xo is a least square solution of 
Ax =b. 


Example 5.14 Find all the least square solutions to Ax = b, and then 
determine the orthogonal projection be of b into the column space C(A) of 
A, where 


1 —2 1 
Oi 3 =] 0 
REE ae A er 
3 -5 0 0 
Solution: 
1 2-1 3 y E 15 -24 -3 
AA =| -2 -3 1 —5 oa a el aes, a SBA 
1-1 2 0 -3 3 6 


5.7. Least square solutions 197 


and 


1 2 -1 3 ; 0 
ATb= | -2 -3 1 —=5 l= S 
P ao i 3 


From the normal equation, a least square solution of Ax = b is a solution 
of AT Ax = AT b, i.e., 
15 —24 -3 T1 
—24 39 3 z2 | = | —1 


By solving this system of equations (left for an exercise), one obtain all the 
least square solutions, which are of the form: 


Ly 1 —8 5 
x= | tw] = 3 —5 | +t] 3 
T3 0 1 


Note that the set of solutions is xo +M (A) where xọ = [-8/3 —5/3 0], 
which needs not be in R(A), and N(A) = {t[5 3 1]? : t €R}. 


Problem 5.21 Find all least square solutions x in R? of Ax = b, where 


1 0 2 3 

0 2 2 —3 

a= —-1 1 -1 }’ Be 0 
-1 2 0 —3 


As Example 5.14 shows, a least square solution can be found by the 
Gauss-Jordan elimination even though ATA is not invertible. Moreover, if 
b € C(A) or if the rows of A are linearly independent (thus, rank A = m and 
C(A) = R”), then b: = b, and so the system Ax = b is consistent. Thus 
the least square solutions coincide with the true solutions. Consequently, a 
solution of the normal equation AT Ax = A’b for a given system Ax = b, 
consistent or inconsistent, is either true solution or a least square solution. 


198 Chapter 5. Inner Product Spaces 


Therefore, the most general method for solving a system of linear equa- 
tions is to work with the normal equation. Practically, if the square matrix 
A’ A is invertible, then the normal equation A’ Ax = A’b of the system 
Ax = b resolves to x = (A’.A)!A"b, which is just a least square solu- 
tion. In particular, if A’ A = In, or equivalently the columns of A are 
orthonormal, then the normal equation reduces to the least square solution 
x = Alb. 

The following theorem gives a condition for AT A to be invertible. 


Theorem 5.19 For any mxn matriz A, AT A is a symmetric nxn square 
matrix and rank( AT A) = rank A. 


Proof: Clearly, A’ A is square and symmetric. Since the number of columns 
of A and A’ A are both n, we have 


rank A+dimN(A) = n = rank (ATA) + dim N (ATA). 
We now show that N(A) = N (ATA) so that rank( AT A) = rank A. 
Clearly, N(A) C N(AT A), since Ax = 0 implies AT Ax = 0. 
Conversely, suppose that x € N(AT A) so that AT Ax = 0. Then 
Ax - Ax = (Ax)" (Ax) = x" (AT Ax) =x- A" Ax =x- 0 =0. 
Hence Ax = 0, and x € N (A). 


Therefore, AT A is invertible if and only if rank A = n: that is, the 
columns of A are linearly independent. In this case, N (A) = {0} and so the 
system Ax = b has a unique least square solution xg in R(A) = R”: 


xo = (AT A)! ATD. 


Moreover, since be = AXo = A(AT A) 1ATb, we obtain the orthogonal 
projection Proje(4) = A(AT A)~!4® also, as a byproduct when rank A = n. 
This can be summarized in the following theorem: 


Theorem 5.20 Let A be an m x n matriz. If rank A = n, or equivalently 
the columns of A are linearly independent, then 
(1) ATA is invertible so that (AT A)! AT is a left inverse of A, 
(2) the vector xo = (AT A) 'A’b is the unique least square solution of a 
system Ax = b, and 
(3) Axo = A(ATA)!ATb = be = Proje(4)(b), that is, the orthogonal 
projection of R” onto C(A) is Proje,4) = A(A? A)“1A?, 


5.7. Least square solutions 199 


Remark: (1) For an m x n matrix A, by applying Theorem 5.19 to A’, 
rank A = m, or the rows of A are linearly independent, if and only if AA” is 
invertible. In this case AT (AAT)! is a right inverse of A (cf. Theorem 2.24). 
(2) If the columns uj, ..., Un of A are orthonormal, then they form an 
orthonormal basis for the column space C(A) of A so that the orthogonal 
projection onto C(A) is reduced to a simpler form like Corollary 5.12: the 
orthonormality of the columns of A implies A’ A = In, so that 


Projey4) = A(AT A) TAT = AA? = [u --- ug] 


T T 
uju; +++: + unun. 


Moreover, the least square solution in this case is given as 


—— ub —— u: b 
Xo = Ab = ; b = ; 
—— ub —- un ` b 


which is the coordinate expression of be = Projce(a)(b) with respect to the 
orthonormal basis {u;, ..., Un} for C(A). 

(3) Ifrank A =r < n, one can reduce the columns of A to a basis for the 
column space and work with this reduced matrix A (thus, A’ A is invertible) 
with r columns: The orthogonal projection be of b in Ax = b can be found 
to be 

be = A(A? A)! A’. 
However, a least square solution of Ax = b should be found from the original 
normal equation directly, since ¥) = (AT A) 'A’b € R" has only r compo- 
nents so that it can not be a solution of Ax = be. The QR decomposition of 
A, which will be discussed in Section 5.10, will simplify the computations. 
In Section 7.6, we derive a general formula for finding the optimal solution 
for the general system of linear equations. 


Example 5.15 Find all the least square solutions of the system: 


Ax = ie fel cai 


oo] ly 9 | 


Determine also the orthogonal projection be of b in the column space C(A). 


200 Chapter 5. Inner Product Spaces 


Solution: Clearly, the two columns of A are linearly independent and C(A) 
is the xy-plane. Thus b ¢ C(A). Note that 


1 1 0 2 7 
ATA = | | 1 5 al | 
2 50 D o | 7 29 
which is invertible. By a simple computation one can obtain 
1| 29 -7 
Tay-l — 2 
A eG | -7 2 | 
Hence, 
= 1| 29 -7 7 1 | 42 14/3 
LAAT LAT, = 2 = 
x= (arayato= 9“ a] [os ]=5 [2] =| 


is a least square solution, which is unique since N (A) = 0. The orthogonal 
projection of b in C(A) is 


1 2 4 
p-ae= fi s|[ 8] -[s 
0 0 0 


Problem 5.22 Find all the least square solutions of the following inconsistent 
system of linear equations: 
zr+y = 1 
2r+2y = 3 
3r +4y = 4. 


Problem 5.23 Let W be the subspace of the Euclidean space R? spanned by the 
vectors vı = (1, 1, 2) and v2 = (1, 1, —1). Find Projy,(b) for b = (1, 3, —2). 


In general, for a given n-dimensional subspace U of R”, the computation 
of the orthogonal projection Proj, of R” onto U appears quite often in 
applied science and engineering problems. For this, we first take a basis for 
U and then make an m x n matrix A with these basis vectors as columns. 
Then we have U = C(A), and so by Theorem 5.20 


Projy = Proje{a) = A(AT A)! AT = [Projy]a- 


In fact, this projection itself is the orthogonal projection matrix: the matrix 
representation with respect to the standard basis a. Note that this projec- 
tion matrix Projy is independent of the choice of a basis for U due to the 
uniqueness of the projection matrix representation of a linear transformation 
with respect to a fixed basis. 


5.7. Least square solutions 201 


Example 5.16 Find the projection matrix P on the plane 2z — y — 3z = 0 
in the space R? and calculate Projec4yb for b = (1, 0, 1). 


Solution: Choose any basis for the plane 2x — y — 3z = 0, say, 
vı = (0, 3, —1) and v= (1, 2, 0). 


0 1 
Let A= 3 2 | be the matrix with vı and vo as columns. Then 
—1 0 
—1 
10 6 1 5 —6 
Tixsi a es 
(AA -| 6 J =| ‘a 


The projection matrix P = Proje:,) is 


P = A(ATA) 1AT 


fi lit ge Se Orie eee we, S| 
ac ee ee e-a lics oe ga |e. ee 
BEN 6 -3 5] 
[e 2 6] fi] af] 
Ps || 2 1a Sa) oe Sl a 
a| 6-3 s] li] “la | 


Problem 5.24 Find the orthogonal projection matrix P of the Euclidean 3-space 
1 1 

R? onto the column space C(A) for A = | 1 0}. 
0 1 


Problem 5.25 Find the matrix for orthogonal projection from the Euclidean 3- 
space R? to the plane spanned by the vectors (1, 1, 1) and (1, 0, 2). 


Problem 5.26 Let V = P3(R) be the vector space of polynomials of degree < 3 
equipped with the inner product 


1 
(f, 9) =| f(x)g(x) dx for any f and gin V. 


Let W be the subspace of V spanned by {1, x}, and define f(x) = 2?. Find the 
orthogonal projection Projy(f) of f on W. 


Remark: In R”, the Gram-Schmidt orthogonalization may be reformulated 
as follows: For a basis {c,,...,¢,} for R”, let A; be the n x i matrix whose 
columns are ¢j,...,¢; and P; be the projection Proje, 4,) = A;(APA;)-1AP 


202 Chapter 5. Inner Product Spaces 


onto the column space C(A;) of A; for i = 1,...,n. Then Pye; = cj for 
j <1, and so set 
uy = SI and u; = SSR AG 
A a a A 
||Preal|’ t lle- Pi-reil 
for i = 2,...,n. Then {u;,..., Un} is an orthogonal basis for R”. In prac- 
tical computation, once {u;,..., u;} has been obtained, 
P, = wu, tee +u +u? = Py t uul 
Pricey = (c; u)u +--+ + (ej u1 )Ui-1, 


which gives the same formula as the one in Gram-Schmidt. 


5.8 Applications: Approximations by polynomials 


There are wide range of applications of the least square solutions to experi- 
mental science. One of the simple application of the least square solution is 
the following example to determine the spring constants. 


Example 5.17 Hooke’s law for springs in physics says that, for a uniform 
spring, the length stretched or compressed is a linear function of the force 
applied: that is, the force F applied to a spring is related to the length x 
stretched or compressed by the equation 


F =a + kz, 


where a and k are some constants depending on the material of the spring. 

Suppose now that, given a spring of length 6.1 inches, we want to deter- 
mine the constants a and k under the experimental data: The lengths are 
measured to be 7.6, 8.7 and 10.4 inches when forces of 2, 4 and 6 kilograms, 
respectively, are applied to the spring. Thus, these data may be plotted as 


(x, F) = (6.1, 0), (7.6, 2), (8.7, 4), (10.4, 6), 


in the #f-plane. Then certainly they are not on a straight line of the form 
F =a + kz in the zF-plane, which may be caused by experimental errors. 
Hence the system of linear equations: 


Fi = a+6.1k = 0, 
Fy, = at7.6k = 2, 
F = a+8.7k = 4, 
F, = a+104k = 6 


5.8. Applications: Approximations by polynomials 203 


is not consistent (7.e., has no solutions so the second equality in each equa- 
tion may be not a true equality). Thus, the best thing one can do is to 
determine the straight line a + kx = y that ‘fits’ the data best: that is, the 
line that minimizes the vertical distances |F; — y;| from the points F; on the 
line to the data y;, for i = 1, 2, 3, 4. For this we consider the points on the 
line F = (F\, F2, F3, F4) and the data b = (0,2, 4,6) = (y1, Y2, Y3, Y4) as two 
vectors in R* and minimize the distance: 


lb — FI? = (0 — F1)? + (2 — F2)? + (4 — F3)? + (6 — Fy)”, 


between these two points. Thus, for the original inconsistent system 


1 6.1 0 
i zelia 2 
GIE. HE de | 
1 10.4 6 


we are looking for F € C(A), which is the projection of b onto the column 
space C(A) of A. The least square solution x9 such that Axo = F minimizes 
the distance, which is a reason why we say least square square solution. 


—8.6 + 1.4% 


veoga Y 


Figure 5.5: Least square fitting 


It is now easily computed as 


| p | =x = (ATA)! ATb = | =o | . 


It gives F = —8.6 + 1.42. 


In general, a common problem in experimental work is to obtain a poly- 
nomial y = f(x) in two variables x and y that ‘fits’ the data of various values 


204 Chapter 5. Inner Product Spaces 


of y determined experimentally for inputs x, say 


(x1, yı), (£2, y2), SET (En, Yn), 


plotted in the zy-plane. Some possible fitting polynomials are 

(1) by a straight line: y = a + ba, 

(2) by a quadratic polynomial: y = a + ba + cx?, or 

(3) by a polynomial of degree k: y = ay + aix +--+ + apr", ete. 

As a general case, suppose that we are looking for a polynomial y = 
f(z) = ao + ajz + azr? +--+. + aps" of degree k that passes through the 
given data, then we obtain a system of linear equations, 


f (£1) = ao atı az? OEE apr} = y 
f (x2) = ao Qa1T2 aT see apark = Y2 
f(n) = Q0 + aL azz, jiasi] apr” = Yn, 


or, in matrix form, the system may be written as Ax = b: 


2 k 

Lowy ff os ay ao Yı 
2 k 

1 z2 T5 © 13 ai Y2 
2 

E BR ae ak Yn 


The left side Ax represents the values of the polynomial at x;’s and the right 
side represents the data obtained from the inputs x;’s in the experiment. 
Ifn <k+1, then the cases have already been discussed in Section 2.9. 
ifn > k +1, this kind of system may be inconsistent (7.e., it may have no 
solution). This means that there may be no polynomial of degree k < n— 1 
whose graph passes through the n data (x;,y;) in the xy-plane. Practically, 
it is due to the fact that the experimental data usually have some errors. 
Thus, the best thing we can do is to find the polynomial f(z) that 
minimizes the sum of the squares of the vertical distances between the graph 
of the polynomial and the data. In matrix and vector space language, an 
inconsistency of the system means that the vector b = [y1 ... Yn]? € R” 
representing the data does not belong to the column space C(A) of the 
coefficient matrix A. Minimizing the sum of the squares of the vertical 
distances between the graph of the polynomial and the data means looking 
for the least square solution of the system, because for any c € C(A) of the 


5.8. Applications: Approximations by polynomials 205 


form 
1 zi zy sae ae ag ag + aiti +*+: apr} 
1 29 T2 e. z$ ay ag + aito +: aprk 
; = =C, 
Lee, TŽ ak ak ao + ain ++: apak 
we have 
2 _ ky2 
|b —ell° = (yı — ao — azı —-+- — akti) +- 
ky\2 
(Yn ag — Aln =`: AX) i 


The previous theory says that the orthogonal projection be of b into the 
column space of A minimizes this quantity and shows how to find be and a 
least square solution xo. 


Example 5.18 Find a straight line y = a + bx that fits best the given 
experimental data, (1, 0), (2, 3), (3, 4) and (4, 4). 


Solution: We are looking for a line y= a + bz that minimizes the sum of 
squares of the vertical distances |y; — a — ba;|’s from the line y = a + bz to 
the data (x;, yi). By adapting matrix notation 


1 zi 1 1 0 
Jliz | |12 _ ja _ | 3 
A= Ea eee a ae 7 | and b= 4l? 

1 zi 1 4 4 


we have Ax = b and want to find a least square solution of Ax = b. But 
the columns of A are linearly independent, and the least square solution is 
x = (AT A)! ATb. Now, 


3 1 
tT,z—| 4 10 Byers | 2 2 m [L 
Aa=| i 30 A 1 1 A BSE 

2 5 

Hence, we have 1 

at ay-l aly 2 

x=(A A) A b= 3 |? 
10 


1 13 
and y = E + 0” is the desired line. 


206 Chapter 5. Inner Product Spaces 


Problem 5.27 From Newton’s second law of motion, a body near the surface of 
the earth falls vertically downward according to the equation 


1 
s(t) = so + vot + 58 


where s(t) is the distance that the body travelled in time t, and so, vo are the 
initial displacement and velocity, respectively, of the body, and g is the gravita- 
tional acceleration at the earth’s surface. Suppose a weight is released, and the 
distances that the body has fallen from some reference point were measured to be 
s = —0.18, 0.31, 1.03, 2.48, 3.73 feet at times t = 0.1, 0.2, 0.3, 0.4, 0.5 seconds, 
respectively. Determine approximate values of sg, vo, g using these data. 


5.9 Orthogonal matrices and isometries 


At the end of Section 5.4, we have seen that some linear transformations 
preserve the inner product and so the length of vectors. Those are called 
isometries, and play very important roles in Geometry. In this section, we 
discuss about them. 

Let A = [c, -+> Cn] be an n x n square matrix, where cj, ..., Cn E R° 
are the column vectors of A. Then a simple computation shows that 


T 
Se Se a 
ATA= : C1 +++ Cn | = [efc] = [e+ cj]. 
sarea ial 


Hence, if the column vectors are orthonormal, c}¢; = 6;;, then At A=. 
that is, A” is a left inverse of A. Since A is a square matrix, this left inverse 
must also be the right inverse of A, i.e., AA’ = In. Equivalently, the row 
vectors of A are also orthonormal. This argument can now be summarized 
as follows. 


Lemma 5.21 Let A be ann xn matriz. The following are equivalent. 
(1) The column vectors of A are orthonormal. 


(2) APA =; 
(3) AT =A. 
(4) AAP =h. 


(5) The row vectors of A are orthonormal. 


Definition 5.8 A square matrix A is called an orthogonal matrix if A 
satisfies one (and hence all) of the conditions in Lemma 5.21. 


5.9. Orthogonal matrices and isometries 207 


Example 5.19 It is easy to see that the matrices 


_ | cos@ —sin#@ B= cos 0 sin 
~ | sinf  cosð |’ ~ | sinf —cosé 


are orthogonal, and satisfy A~! = A’ and Bt = BT. In fact, the linear 
transformation defined by Ax is a rotation through the angle 0, while the 
one defined by Bx is the reflection about the line passing through the origin 
that forms an angle 6/2 with the positive x-axis. 

Conversely, one can show that any 2 x 2 orthogonal matrix must be either 
one of the forms A or B above. 


Indeed, suppose that A = | : : | is an orthogonal matrix, so that 


AAT = h = A’ A. By a direct computation, we get 


+P = +e = e+e = +H 
ac+bd = ab+cd = 0. 


I 
a 


From the first equations, we get a = +d and b = +c. Then by the second 
equations, b = —c if a = d, and b = c if a = —d. By the last part of the first 
equation, one can choose 0 so that c = sin and d = cos 0. 


Problem 5.28 Find the inverse of each of the following matrices. 


iE 0 0 | 1/vV2 —1/v2 0 
(1) | 0 cos@ sind}, (2) | -1/V2 -1/v2 0 
| 0 —sinf cosé | 0 0 1 


What are they as linear transformations on RÌ: rotations, reflections, or other? 


Intuitively, any rotation or reflection on the Euclidean space R” preserves 
both the length of vectors and the angle of two vectors. In general, any 
orthogonal matrix A preserves the lengths of vectors: 


|| Ax||? = Ax - Ax = (Ax)! (Ax) = xT AT Ax = x’ x = ||x||?. 


Definition 5.9 Let V and W be two inner product spaces. A (linear) 
transformation T : V + W is called a (linear or an orthogonal transfor- 
mation) isometry, if it preserves the lengths of vectors, that is, for every 
vector x € V 


IT| = Ixl]. 


208 Chapter 5. Inner Product Spaces 


Clearly, any orthogonal matrix is an isometry as a linear transformation. 
Note that, a translation T : V — V given by T(x) = x + a for some fixed 
vector a € V is not a linear transformation, but is an isometry. 

IET :V — W is an isometry, then T is a one-to-one, since the kernel of 
T is trivial: T(x) = 0 implies ||x|| = ||T(x)|| = 0. Thus, if dim V = dim W, 
then a linear isometry is also an isomorphism. The following is an interesting 
characterization of a linear isometry. 


Theorem 5.22 Let T : V > W be a linear transformation on an inner 
product space V intoW. Then T is an isometry if and only if T preserves 
inner products: that is, for any vectors x, y in V, 


(T(x), T(y)) = (x,y) 


Proof: IfT preserves inner product, then clearly it is an isometry. Con- 
versely, let T be an isometry. Then, for any x,y € V, 


(T(x +y), T(x +y)) 
= (x+y,x +y) 


(T(x), T(x)) + 2(T (x), T(y)) + (T (y), T(y)) 
(x, x) + 2(x,y) + (y,y), 


from which we get (T(x), T(y)) = (x,y). 


Recall that if @ is the angle between two nonzero vectors x and y in an 
inner product space V, then for any isometry T : V > V, 


(x, y) (Tx, Ty) 


cos 0 = ——— = ———. 
lly]  ITxlllITyIl 


Corollary 5.23 A linear isometry preserves the angle. 


On the other hand, a dilation T : V > V given by T(x) = cx for some 
fixed c € R preserves the angle but not the length of vectors. That is, the 
converse of Corollary 5.23 is not true in general. 


Corollary 5.24 Let T: V — W be an isometry of an inner product space 
V into another space W. Then the image of any set of orthonormal vectors 
in V is also orthonormal. 


We have seen that any orthogonal matrix is an isometry as a linear 
transformation T(x) = Ax. The converse is also true: 


5.9. Orthogonal matrices and isometries 209 


Corollary 5.25 IT : V — W is an isometry of an inner product space 
V into another space W of the same dimensions, then the matrix of T with 
respect to any orthonormal bases is an orthogonal matriz. 


Note that d(x,y) = ||x — y|| for any x and y in V. 


Corollary 5.26 A linear transformation T : V — W is an isometry if and 
only if, for any x and y in V, 


d(T(x), T(y)) = d(x, y). 
Corollary 5.27 Let A be annxn matriz. Then, A is an orthogonal matrix 
if and only if A: R —> R", as a linear transformation, preserves the dot 
product: that is, for any vectors x, y € R”, 


Ax:-Ay=x-y. 


Remark: In summary, for a linear transformation T : V — W of vector 

spaces, the followings are equivalent: 

(1) T is an isometry: that is, T preserves length. 

(2) T preserves the inner product. 

(3) T preserves the distance. 

(4) TË with respect to orthonormal bases is an orthogonal matrix. 
Anyone (hence all) of these conditions implies that T preserves the an- 

gles, but not conversely. In particular, a square matrix A preserves the dot 

product if and only if it preserves the lengths of vectors. Thus, an isometry 

in an inner product space transforms a figure to a congruent figure, and for 

any two congruent figures there is an isometry that transforms one to the 

other. This means that the classical Euclidean geometry can be studied in 

the framework of (Linear) Algebra. 


Problem 5.29 Find values r > 0, s > 0, a, b and c such that matrix Q is 
orthogonal. 


| r sa | r -s a 
(DOSE 2 b], AARS 3s b 
| r o—-S ¢ | r —28 ¢ 


Problem 5.30 (Bessel’s Inequality) Let V be an inner product space, and let {v1, 
.. +; Vm} be a set of orthonormal vectors in V (not necessarily a basis for V). Prove 
that for any x in V, ||x||? > 07", Kx, va) [?- 


Problem 5.31 Determine whether the following linear transformations on the Eu- 
clidean space R? are orthogonal. 
(1) T(z, y, 2) = (2, e+ by 


j 7 5 è 
(2) T(z, Y, z) = (ge + Hz, Hy E Oe z). 


210 Chapter 5. Inner Product Spaces 


5.10 QR decomposition I 


For a given m x n matrix A = [ce] cg --- Cn], the orthogonal projection onto 
the column space C(A) = U may be computed directly as 


Projy = A(AT AA’, 


provided rank A =n. 

However, an orthonormal basis for U = C(A) plays crucial roles in com- 
putational linear algebra, and so the orthogonalization of vectors has become 
an essential technique. Basically, the only orthogonalization we have is the 
Gram-Schmidt orthogonalization, which can be expressed in matrix form, 
called the QR decomposition. 

Let {c1, ..., Cn} be an arbitrary basis for a subspace U C R”. The 
Gram-Schmidt orthogonalization process to this basis may be written as the 
following steps: Let A = [e, --- Cn] with c;’s as the columns. 

(1) From the basis {c1, ..., Cn}, an orthogonal basis {q1, ..., Qn} for 
U can be computed by the Gram-Schmidt process: 


qı = Ci 
= (qi, C2) 
Qo: WS) Ag eee 
(qi, 41) 
= (qi, Cn) (Gn; Cn) 
Qn = na FT oF _ Ani. 
(q1, 41) {Gn=1; Qn=1) 


(2) By normalization of these vectors: u; = q;/||q;||, one obtains an 
orthonormal basis {u;, ..., un} for U. 
(3) By rewriting those equations in (1), one gets 


Cc) = qi = buy 

C2 = a12qı + qe = bj2u; + b22U2 

Cn = G@inQ@r +++: +4n—-1inG@n-1+Gn = bins +--+ + OnnUn, 
where aij = ae for i < j, a4 = 1, and 


(di, €j) 
i 


49 


for i < j, which is just the component of cj in u; direction. 


5.10. QR decomposition I 211 


(4) Now the equations in (3) above can be written in matrix notation as 


bii big ++: Bin 
O bog ++: bon 
A= ler ++ eal = [m ++ un] : = QR, 
0 OL ban 
where Q = |u; --- Un] is the mxn matrix with orthonormal columns, which 


from an orthonormal basis for C(A), and R is the n x n upper triangular 
matrix. 

(5) Note that rank A = rank Q =n < m, and C(Q) =U = C(A) which 
is of dimension n in R”. Moreover, the matrix R is an invertible n x n 
matrix, since each bj; = (uj,¢;) is equal to ||e; — Pj_1¢;||, and so bjj 2 0 
for all j because cj Z C(Aj-1), where P;_1 is the projection onto C(A;_1) 
and Agi = [c1 a cj—1]. 

(6) The orthonormality of the column vectors of Q means QTQ = In, 
and the j-th column of the matrix R is simply the coordinate vector of 
cj with respect to the first j orthonormal vectors in the orthonormal basis 
B = {u;, .-., Un} for U: i.e., 

cj = (ui, cj)ur + (u2, cj)u2 a aa (uj, cj) U;, 


and so 
(uy, €j) bij 
[ej]e = (uj, Cj} bjj 
0 0 


Theorem 5.28 Any m x n matriz A of rank n can be factorized into a 
product QR = A, where Q is an m x n matrix with orthonormal columns 
and R is an n xn invertible upper triangular matrix with positive diagonals. 


Definition 5.10 The decomposition A = QR is called the QR factoriza- 
tion or the QR decomposition of an m x n matrix A of rank n, where 
the matrix Q = [u, --- u,|] is called the orthogonal part of A, and the 
matrix R = [b;;] is called the upper triangular part of A. 


Example 5.20 Find the QR decomposition of the following matrix: 


1 1 0 
1 0 1 
A=[|cı C2 c3] = 011 
0 0 1 


212 Chapter 5. Inner Product Spaces 


Solution: By the Gram-Schmidt orthogonalization, compute the orthogo- 
nal part Q and the upper triangular part R: 


qd = (= (1, 1, 0, 0) 
c2 ° qi 1 1 
q2 = C€2- qı = € T5: 1, 0) 
qı’ qi 2 2 
= C3 : q2 c3:-qı —f 2 2 2 
q3 = C3 -— —— QQ - —— qı = 9) 9) apl ’ 
q2 : q2 qı : qı 3 3 3 


1 1 
u = SUA = (= 5; 0, 0) 
qi 2 2 
q2 1 1 v2 
u2 pearsa = —=; —— 5; — =; 0 
q2 6 6 3 


2 2 2 y3 


= — Ja ($e vat) V2 S) 


1 3 1 1 7 
Then cı = V2u,, C2 = zut 5U2, C3 = Ju +u + yf Tus, where the 
coefficients of c; in u; directions are just u; : €j = bij, for i < j. Therefore, 


Sa VZ 1v3 1v2 
A = 0 1 1 = |u; u2 u3] 0 vV3/ V2 1/V6 
001 j 0 vi/v3 


1/ V2 1/V6 —2/vV21 

Me ane a a UAT oy 
0 V2/V3  2/V2T : l VIS | ' 
0 0 V3/Vv7 


6/7 1/7 if? -2/7 
E th T EE OG 
Proja = QQ™ = =1/7 ST 6/7 27 
—2/7 2/7 2/7 3/7 


Problem 5.32 Prove that QR factorization of an invertible matrix A is unique. 


5.10. QR decomposition I 213 


In this QR decomposition of A, the column u,’s of Q form an orthonor- 
mal basis for C(A), and so the orthogonal projection matrix and the least 
square solution to Ax = b for b € R” can be calculated easily as 


Projy = wuj + = +unu, = QQ", 
( = A(A"A) 1A" = QR(R'Q™QR) RQ") 
(Axo = )QRxo be = QQTb or Rxo = Qb, (7 QTQ = Mh). 


Note that if A is a square matrix of rank n = m, then Q is an orthogonal 
matrix, and R is an invertible upper triangular matrix. 


Corollary 5.29 Let A be an m xn matriz of rank n and let A = QR be its 
QR factorization. Then, 


(1) the projection matriz on the column space of A is [Projeçajla = QQ’. 
(2) The least square solution of the system Ax = b is given by x9 = 


R'Q'b, which can be solved by using back substitution to the system 
Rx = QTD. 


The decompositions of a matrix A, like LU or QR, are very useful for 
solving a system Ax = b: One can use A = LU, and solve Ux = L~'b 
when the system is consistent, and use A = QR to solve Rx = Q’b and get 
a least square solution when it is inconsistent, which are easy to solve since 
both U and R are upper triangular matrices. 

For a practical computation of Q and R in the Gram-Schmidt orthogo- 
nalization, the Gauss elimination method may be used: 

Note that, if A is a symmetric square matrix, then we get A = LDL! by 
the Gauss elimination. Furthermore, if all the pivots are nonnegative, we can 
have A = (LD!/?)(LD‘/2)",, which is called the Cholesky factorization of 
A. This idea provides a practical method of computing Q and Rin A= QR 
even for a matrix A of rank A=r <n. 

Let A be an m x n matrix of rank r. Then A’ A is a symmetric matrix 
of rank r. Find its LDU decomposition: 


A'A= tpt =1| F ie 


where }, is the diagonal matrix of order r of nonzero pivots. Then 
D=L(ATA)(ET)! = (A(L)")"(A(L})") = (AJA 


shows that the nonzero columns of A = A(L~')* are orthogonal since the 
entries of the right side (A)T A are the dot products of the columns of A 


214 Chapter 5. Inner Product Spaces 


and form a diagonal matrix D on the left side. Moreover, the nonzero 
pivots on the diagonal of D are just the squared length of nonzero col- 
umn vectors of A, which means that all the nonzero pivots are positive. 
Thus AD~'/? = A(L~!)7D~!/? normalizes the nonzero columns of A, where 
wo 0 
0 
of A, and so A = QD!/2L" = QR where R = D'/?L" is the upper triangu- 
lar part of A with r nonzero rows. Note that, in A= A(L~')’, L7! was the 
Gauss elimination on A’ A and multiplying its transpose to A on the right 
is just doing the same operations on the columns of A. 


D! = . Hence A(L7})T D" = Q is the orthogonal part 


Theorem 5.30 (Gram-Schmidt orthogonalization II) Let A be an m 
xn matriz of rank r. Let ATA = LDL" be the LDU decomposition of ATA. 
Then the r nonzero columns of Q = A HED"? form an orthonormal 
basis for C(A), and A = QR, where Q and R= D'/?L" are of rank r: i.e., 


A=QR, where Q=A(L!)"D"'? and R= DPL". 


When rank A =r < n, one can still obtain the orthogonal projection as 
Proje(a) = QQ’. For a least square solution of Ax = b, we solve QRx = 


QQTb or Rx = Q'b, since QQTb = be € C(A), QTQ = | A : 


|. and R 
and Q! have only r nonzero rows. 

There is one more important decomposition of A, which gives a gen- 
eral formula for the solutions of a general system. It will be discussed in 


Section 7.6. 


Example 5.21 Consider the matrix A in Example 5.20: 


oR OS 
See re oO 


211 | 
. Then ATA =| 1 211]. 
1 1 3 | 


oor Fe 


The LDU decomposition of AT A is 


1 0 0 2 0 0 1 1/2 1/2 
MAS LOE =| 1/2 1 0 0 3/2 0 0 1 1/3 
1/2 1/3 1 0 0 7/3 0 0 1 


5.10. QR decomposition I 215 


= — = = -= 
Thus, A(L~*)* = och a ; : 7 0 1 2/3 
0 0 1 0 0 1 


whose columns are orthogonal. By multiplying D~/? to this matrix, we get 


eS Wl KY Wildy 


O e NF Nee 
| 
aa] 

| 
= eas 
a~ 
a 
| 
© 


O Cc me me 


whose columns are orthonormal (see Example 5.20). Now, the upper trian- 
v3 A2 a/v) 
gular matrix R = D2 EF = 0 V¥3/V2 1/V6 : 
0 0 V7/Vv3 | 


Example 5.22 Find all the least square solutions of Ax = b and the or- 
thogonal projection Proje,4)b of b, where the matrix A and b are given 


as: 
1 1 v2 1 
A=]|0 y⁄2 2|, and b=) v3 
0 <1, v2 V3 


Solution: This system is inconsistent. The rank of the matrix A is 2 and 
so is that of the symmetric matrix 


1 1 v2 
ATA= 1 4 4/2 |, 
v2 4⁄2 8 


which is the one given in Example 1.19. Thus A’ A is not invertible and its 
LDU decomposition is 


te Oe i sed 
ATA=LDL" =| 1 1 0 03 0 01 72 
vi v2i};loo0o0}loo 1 | 
1 1 v2 1 —1 0 1 00 
Thus, A(L!)'’ = | 0 y2 2 0 1 =f2 /=/]0 xv 0l, 
0 1 V2 0 0 1 0 10 


216 Chapter 5. Inner Product Spaces 


whose columns are orthogonal. By multiplying D~/? to this matrix, we get 
1 00 1 0 0 I 090 
Q=A(L")'D"2=|0 V2 0 0 a 0}=1]0 = 0 
1 
0 1 0 0 0 0 0 Z 0 


whose nonzero columns are orthonormal. Now, the upper triangular matrix 


1 0 0 | 11 V2 | 1 1 v2 | 
REDO S|) O-4/38-O) | 0.1 9/2 S-| Ooo a6 
0 00 | 00 1 | 0 0 0 ia 
Note that both Q and R are of rank 2. We now solve Rx = Q'b: 
1 1 v2 | x | 1 0 0 1 ] 1 ] 
Ae v2 1 a 
0 V3 v6 y,= | 0 Z V3 V3 =| V2+1). 
0 0 0 z 0 0 0 v3 0 
By the back substitution with z = 0, we get a least square solution: 
1-2- L 
oes ih l Vet 8 
7 0 
A simple computation shows that [0, — v2, 1]” spans the null space N (A) 
of A. Thus all the least square solutions are in x9 + N (A): 
-x21 
1 A VE ve 
= V2 yL A| —-vV2 AER 
x z L 5 + : , AE 
0 


Now the orthogonal projection of b onto the column space C(A) is Axo = be. 
Equivalently, from the orthogonal projection matrix, we get 


1 0 0 1 _ 

p 2 2 
Projgyh =QQ"b=| 0 2 3 =| vata | =be 

0o a2 2 3 v2 

Gwe 


5.11. QR decomposition I 217 


5.11 QR decomposition II 


There are some other important ways of decomposing a rectangular matrix 
A into QR forms. For a symmetric square matrix, these methods may be 
used to compute its eigenvalues directly. 


(1) Recall that the least square solution problem of Ax = b is equivalent 
to minimizing ||Ax — b||. In this case, the LU decomposition of A, which is 
actually Gaussian elimination, does not help much, since there is no reason 
that the least square solutions of Ax = b and Ux = L~'b are equal. 

However, if we can reduce Ax = b to row echelon form by multiplying 
orthogonal matrices Q to both sides of the equation, the reduced form and 
the original form will be the same, since the problem becomes to minimize 
the same vectors: 


||QAx — Qbl| = ||Q(Ax — b)|| = || Ax — b||. 


One of the simplest candidates of the orthogonal matrices are the Givens 
rotation matrices, which are of the form: for i < j, 


1 01; 01; 0 
O41 c —s 
Qij = : , with e+ =l. 
0j1 Ss C 
0 1 


mxm 


It is quite clear that Q;,Qij = I, and the i-th and j-th rows of QA =B 
are 


Cūik + Sajk = bik, and — Saik + cCajk = bjk; k =1,...,N. 


In particular, for any column index £, if we choose 
(aie + a3) (aj + a50) 
then we get bie = (a%, + a2)? and bje = 0 for j > i. Thus by taking 


the pivot aj, the lower entries ajg are killed and the pivot is also changed. 
Suppose that aj = 0. Then the denominator in c and s is zero only if ajg = 0 


218 Chapter 5. Inner Product Spaces 


also. In this case, no transformation is necessary. If aj, Æ 0, then c = 0 
and s = +1 so that bje = taj = 0. Therefore, arbitrary rectangular matrix 
can be reduced to a row echelon form by orthogonal transformations of the 
Givens rotation matrices, just like the forward eliminations: 


Qh QTA=R, or A=Qi =: QR=QR, 


where each Q; is a Givens rotation matrix and R is a row echelon form (or 
an upper triangular matrix when A is a square matrix). This decomposition 
of A can be used just like the LU decomposition is used to solve Ax = b, 
but it can be used for the least square problem too. 


(2) The next orthogonal matrices to transform an m x n matrix A into a 
row echelon form are the Householder orthogonal matrices, which are 
of the form: 

H = I — 2uuf, 


for a unit vector u € R”, which is a reflection about the line in the direction 
of u. Clearly, H is symmetric and orthogonal (i.e., HTH = I). We now 


take a unit vector u of the form u; = (0,...,0,u;,...,Um) E R”. Then 
1 0; 0 
H; =I —2ujuy = i 
ue mes 0; Kee Ok 
0 Kotte Ok 


HA will change the rows of A from 7 to m. To make i + 1 to m-th entries 
of the ¢-th column of A (i.e., everything below the pivot aje to be zero), we 
look at the computation: Let a‘ be the ¢-th column of A. then 


Ha’ = (I - 2uju; ja’ = a’—2au;, with a= ulaf 
aie aie 
aye — 20; aye — 20U; 


Qi41 £ — 2aUuj44 0 


Amt — 2Um 0 


5.12. Exercises 219 


Therefore, we take u; with 


=e 


u; = 
o> Deg? 


jg=ttl,...,m. 


By definition, a = ujajg +--+ + Umame and ue +++» u2, = 1. Define 


s=ayt---ta2,, B=ay—20u; or uj = UAE 
2a 
and plug these in the above equations to get 
20° =s +auß, 407 = s + 2aub + B°. 
By solving these equations, we get 
p= Fs?, 2a = [2b(air + B)]'”. 
If aj41¢,---,@me are almost zero already, s ~% as, and 8 will be nearly Flaig]. 


We take £ to have the same sign as aj;g; otherwise, aje + E 0, a œ 0, and 
all entries of u; will be approximately of the form e, With this choice of uj, 
A;=I- 2u;u; eliminates everything below the pivots aje: Thus, we have 


H,-:-H,A=R, or A=QR, with Q= HT... HF, 
where Q is an orthogonal matrix and R is an echelon form (or an upper 


triangular matrix). In practice, we compute H;A = A — 2u;(u} A), which 
reduces a lot of computation steps. 


5.12 Exercises 


5.1. Decide which of the following functions on R? are inner products and which 
are not. For x = (#1, £2), y = (yı, y2) in R 


(1) (x,y) = 21y1@2y2, 

(2) (x,y) =4a1y1 + 4a2y2 — aiy2 — T241, 
(3) (x,y) = @1y2 — £291, 

(4) (x,y) = z141 + 32242, 

(5) (x,y) = 2@1y1 — L1y2 — toy1 + 82242. 


5.2. Show that the function (A,B) = tr( ATB) for A,B € Mnxn(R) defines an 
inner product on Mnxn(R). 


5.3. Find the angle between the vectors (4, 7, 9, 1, 3) and (2, 1, 1, 6, 8) in RŠ. 


220 


5.4. 


5.5. 


5.6. 


5.7. 


5.8. 


5.9. 


Chapter 5. Inner Product Spaces 


Determine the values of k so that the given vectors are orthogonal with 
respect to the Euclidean inner product in R4. 


2 1 2 2 
3 k 8 —6 
(1) k 2 3 7 (2) 4 2 2 
4 -5 k k 


Consider the space C[0, 1] with the inner product defined by 
1 
o) = | f@)g(a)ae. 
0 


Compute the length of each vector and the cosine of the angle between each 
pair of vectors in each of the following: 


(1) f(x) =1, g(x) = z; 
(2) f(x) =2™, g(x) = a”, where m, n are nonnegative integers; 
(3) f(x) =sinwma, g(x) = sin rng, where m, n are integers. 
Prove that 
(a, ++ an) < nlai +++. +a?) 
for any real numbers a1, a2, ..., an. When does equality hold? 


Let V = P2([0, 1]) be the vector space of polynomials of degree < 2 on [0, 1] 
equipped with the inner product 


(f,9) = | f (t)g(t)dt. 


(1) Compute (f,g) and ||f|| for f(z) = £ + 2 and g(x) = 2? — 22 — 3. 
(2) Find the orthogonal complement of the subspace of scalar polynomials. 


Find an orthonormal basis for the Euclidean 3-space R3 by applying the 
Gram-Schmidt orthogonalization to the vectors x = (1, 0, 1), x2 = (1, 0, —1), 
x3 = (0, 3, 4). 


Show that if u is orthogonal to v, then every scalar multiple of u is also 
orthogonal to v. Find a unit vector orthogonal to vı = (1, 1, 2) and vz = 
(0, 1, 3) in the Euclidean 3-space R°. 


. Determine the orthogonal projection of vı onto və for the following vectors 


in the n-space R” with the Euclidean inner product. 
(1) vi = (1, 2, 3), v2 = (1, 1, 2), 
(2) vı =(1, 2, 1), v2 = (2, 1, —1), 
(3) vi = (1, 0, 1, 0), v2 = (0, 2, 2, 0). 


. Let S = {v;}, where v;’s are given below. For each S, find a basis for S+ 


with respect to the Euclidean inner product on R”. 
(1) vi = (0, 1, 0), vo = (0, 0, 1), 


5.12. Exercises 221 


5.14 


5.15 
5.16 


(2) v= (1, 1, 0), V2 (1, 1, 1), 
(3) vr= (1; 0, 1, 2), vz= {L 1, 1, 1), v3 = (2, 2, 0, 1). 


. Which of the following matrices are orthogonal? 
1/2 =1/3 4/5 —38/5 
(1) EE h (2) E ake 
| 1/V2 0 -1//2 eve 1/V3 —1/v6 


(3) 0 -1/v2 13|, 4) | a/v? -1/v3 1/v6 
kea aia Lo v3 2v6 


. Let W be the subspace of the Euclidean 4-space Rt consisting of all vectors 
that are orthogonal to both x = (1, 0, —1, 1) and y = (2, 3, —1, 2). Find 
a basis for the subspace W. 


. Let V be an inner product space. For vectors x and y in V, establish the 
following identities: 
(1) (x,y) = Hix + yl? - Hix — yll? (Polarization identity), 
(2) (x,y) = z (lx + yl? — lixl? — Ill?) (Polarization identity), 
(3) lx + yll? + lix- yl? = 2((Ix|l? + lly ll?) (Parallelogram equality). 


. Show that x + y is perpendicular to x — y if and only if ||x|| = |ly||. 


. Let A = [c1 +++ Cn] be an m x n matrix whose columns are cj € R”. Prove 
that the volume of the n-dimensional parallelepiped P(A) determined by 
those vectors c;’s in R™ is given by 


(Note that the volume of the n-dimensional parallelepiped determined by 
C1, .--, Cn in R” is by definition the product of the volume of the (n — 1)- 
dimensional parallelepiped (base) determined by c2,...,¢n and the height of 
cı from the plane W spanned by c2,...,€n. Here, the height is the length of 
the vector a = cı —Projw (c1). If the vectors are linearly dependent, then the 
parallelepiped is degenerate, i.e., it is contained in a subspace of dimension 
less than n.) 


222 


5.18. 


5.23. 


5.24. 


5.25. 


Chapter 5. Inner Product Spaces 


. Find the volume of the three-dimensional tetrahedron in the Euclidean 4- 


space Rt whose vertices are at (0,0,0,0), (1,0,0,0), (0, 1, 2,2) and (0,0, 1, 2). 


Find orthonormal bases for the row space and the null space of each of the 
following matrices. 


p24 3 [i 4 0 EEA 
Do i AA AA e a eal N a T 
[201 | 0 0 2 ito ice 


. Let A be an m x n matrix of rank k. Find a relation of m, n and k so that 


Ax = b has infinitely many solutions for every b € R”. 


. Find the 2 x 2 matrix P that projects the xy-plane onto the line y = z. 
. Find the equation of the straight line that fits best the data of the four points 


(0, 1), (1, 3), (2, 4), and (3, 4). 


. Find the cubic polynomial that fits best the data of the five points 


(—1, —14), (0, —5), (1, —4), (2, 1), and (3, 22). 
Let W be the subspace of the Euclidean 4-space R* spanned by the vectors 
x;’s given in each of the following problems. Find the projection matrix P 


for the subspace W and the null space N (P) of P. Compute Pb for b given 
in each problem. 


(1) xı Ss 1, 1), x2 = (1, —1, 1, —1), x = (—1, 1, 1, 0), and 
b= (1, 2, 1, 1). 

(2) xı = (0, —2, 2, 1), x2 = (2, 0, —1, 2), and b = (1, 1, 1, 1). 

(3) xı = (2, 0, 3, —6), x. = (—3, 6, 8, 0), and b = (—1, 2, —1, 1). 


Find the projection matrix for the row space and the null space of each of 
the following matrices: 
Se pia o 
5 5 24 1 
(i), Ng, 8) , 3) {00 2 
1 2 iia | T 
v5 v5 
Consider the space C[—1, 1] with the inner product defined by 
1 
(fa) =f fa)g(a)ae. 
-1 


A function f € C[-1, 1] is even if f(—x) = f(x), or odd if f(—x) = —f(z). 
Let U and V be the sets of all even functions and odd functions in C[—1, 1], 
respectively. 

(1) Prove that U and V are subspaces and C[—-1, 1] =U +V. 

(2) Prove that U LV. 


5.12. Exercises 223 


5.26. Find the QR factorization of the matrix | 


(3) 


Prove that for any f € C[-1, 1], || fI? = ||All? + ||g||? where f = h+g € 
UV. 


sinf cos 
cos 6 0 i 


5.27. For an orthogonal matrix A, show that det A = +1. Give an example of an 
orthogonal matrix A for which det A = —1. 


5.28. 


Determine whether the following statements are true or false, in general, and 
justify your answers. 


(1) 


Two vectors x and y in an inner product space are linearly independent 
if and only if the angle between x and y is not zero. 

If V is perpendicular to W, then V+ is perpendicular to WŁ. 

Every permutation matrix is an orthogonal matrix. 

The projection of the Euclidean m-space R™ on a subspace W is a 
linear transformation of R™ into itself. 

Two different subspaces of the Euclidean m-space R™ may have the 
same projection matrix. 

An n x n symmetric matrix A is a projection matrix if and only if 
A=. 

For any m x n matrix A and b € R”, A? Ax = ATb always has a 
solution. 

An inner product can be defined on every vector space. 

A linear transformation T is an isomorphism if and only if it is an 
isometry. 

Let V be an inner product space. Then ||x — y|| > ||x|| — |ly|| for any 
vectors x and y in V. 

The least square solution of Ax = b is unique for any symmetric matrix 
A. 

Every system of linear equations has a least square solution. 

The least square solution of Ax = b is the orthogonal projection of b 
on the column space of A. 

An isometry is a surjective linear transformation. 


Chapter 6 


Diagonalization 


6.1 Diagonalization of matrices 


Gaussian elimination plays a fundamental role in solving a consistent system 
Ax = b of linear equations. In general, instead of solving the given system, 
one could try to solve the normal equation A’ Ax = Afb to get a least 
square solutions. Note that the matrix A’ A is a symmetric square matrix, 
and so one may assume that the matrix in the system is a square matrix. 

Recall that a square matrix A, as a linear transformation on R”, may 
have various matrix representations depending on the choice of the bases, 
which are all in similar relations. One may now ask if there is some particular 
basis 6 with respect to which the matrix representation [A], of A is as simple 
as possible, say a diagonal matrix D. If this is true, then there is an invertible 
matrix Q such that QDQ~! = A since A itself is the matrix representation 
with respect to the standard basis q@ for R”. 


Definition 6.1 A square matrix A is said to be diagonalizable if there 
exists an invertible matrix Q such that Q~-'AQ = D is a diagonal matrix 
(i.e., A is similar to a diagonal matrix). 


If a square matrix A is diagonalizd, then the similarity A = QDQ™! 
gives an easy way to solve some problems related to the matrix A: 


(1) solving systems Ax = b of linear equations, 

(2) checking the invertibility of A or estimation of det A, 

(3) calculating a power A” or the limit of a matrix series J p] A”, 

(4) solving systems of linear differential equations or difference equations, 

(5) finding a simple form of the matrix representation of a linear transfor- 
mation, etc. 


225 


226 Chapter 6. Eigenvectors and Eigenvalues 


For instance, for (1) above, a system of linear equations Ax = b with a 
diagonalizable square matrix A can be written as QDQ~!x = b, or equiv- 
alently DQ-'!x = Q-'b. Hence, for c = Q~'b, the solution y of Dy = ¢ 
yields the solution x = Qy of the original problem. 

Let us now discuss how diagonalization of a matrix can be possible. 
Suppose that there is a basis 6 = {x,,...,Xn} such that [A]g = D is a 
diagonal matrix. Then [A]g = [Td] [Ala[Id]$., or D = QAQ, or AQ = 


QD, where Q = [Id]3 = [[xiJo «++ [KnJo] = [x1 -+> Xn] is the transition 
matrix and qa is the standard basis for R”. By expansion, 
AQ = [AX --- Ax] 
M1 0 
= QD = |x, ::: Xn] ; 
0 Xn 
= [Axi ++: AnXn]- 


This means that if A has a diagonal matrix representation with respect 
to some basis Ø, the basis vectors x; of 6 must satisfy Ax; = A;x;, for all i = 
1,...,n. That is, the images Ax; of the vectors x; under the transformation 
A must be parallel to x; upto the scalar A;.. This kind of vectors x; are 
called the eigenvectors, and the scalar A; are called the eigenvalues of A. In 
fact, they play important roles in their own right in mathematics and have 
far-reaching applications in not only mathematics, but also other fields of 
science and engineering. 


Definition 6.2 Let A be an n x n square matrix. A nonzero vector x in 
the n-space R” is called an eigenvector (or characteristic vector) of A 
if there is a scalar À in R such that 


Ax = Xx. 


The scalar A is called an eigenvalue (or characteristic value) of A, and 
we say x belongs to À. 


Theorem 6.1 Let A be ann x n matriz. Then A is diagonalizable if and 
only if A has n linearly independent eigenvectors. 


Proof: As we have seen above, diagonalizability implies Ax; = ;x; for 
i =1,...,n, while the fact that the columns of Q are eigenvectors implies 
AQ = QD with eigenvalues on the diagonal of D. The invertibility of Q is 
equivalent to the linearly independence of the columns of Q. 


6.1. | Diagonalization of matrices 227 


Thus, to diagonalize a matrix A, it is good enough to find a basis of 
eigenvectors of A: 
Step 1. Find n linearly independent eigenvectors X1, x2, ..., Xn of A. 
Step 2. Make the matrix Q = [X1 X2 ++- Xn] with x,;’s as its columns. 
Step 3. Then Q-'AQ = D, whose diagonal entries are the eigenvalues 
Al, «++, An associated with the eigenvectors xj, 7 =1, 2, ..., Nn. 


The problem now is how one can find such a basis of eigenvectors. 

Note that if x is an eigenvector of A belonging to an eigenvalue A, then 
so are all the scalar multiples of x. Moreover, the set of all eigenvectors of 
A belonging to A together with the zero vector is a subspace of R”, called 
the eigenspace of A belonging to À and denoted by E(A), or just Fy if 
there is no confusion about the matrix A. Since Ax; is parallel to x;, the 
subspace Æ, is invariant under the linear transformation A : P — R” in 
the sense A(£,) C Ey. Such a subspace is called an invariant subspace. 

By definition, a nonzero vector x is in Æ, if and only if x is a nontrivial 
solution of the homogeneous system (AJ—A)x = 0, i.e., x(# 0) E€ N(AI—A), 
and so N(AI — A) = E)(A). However, N (AI — A) is non trivial if and only 
if A satisfies the equation 


det(\I — A) = 0 = det(A — AI). 


Note that the left side of this equation is a polynomial in A of degree 
n, called the characteristic polynomial of A. Thus the eigenvalues are 
simply the n zeros of the characteristic polynomial. 

Therefore, to find eigenvectors of A, one has to determine the eigenvalues 
of A first (or the zeros of the characteristic polynomial), and then solve the 
homogeneous system (AJ — A)x = 0 for each eigenvalue À. In summary, by 
referring to Theorem 2.26 we have the following theorem. 


Theorem 6.2 For any square matriz A, the following are equivalent: 
(1) Ao is an eigenvalue of A; 
(2) det(Aol — A) =0, or Xo is a zero of the characteristic polynomial; 
(3) Aol — A is singular; 
(4) the homogeneous system (Ag I —A)x = 0 has a nontrivial solution: i.e., 
dim N(AoI — A) = dim Ey,(A) > 1. 


Example 6.1 [Distinct real eigenvalues] Find the eigenvalues and eigenvec- 
tors of 


228 Chapter 6. Eigenvectors and Eigenvalues 


Solution: The characteristic polynomial is 


ao) 
= aw al 


Thus the eigenvalues are A; = 0 and Ag = 3. To determine the eigenvectors 
belonging to »;’s, we should solve the homogeneous system of equations 
(A,J — A)x = 0. Let us take A; = 0 first; then the system of equations 
(A, — A)x = 0 becomes 


—2 Tı =. V2 x2 = 0, 
-v2 zı = zg = 0, 


det(AT = A) = det | | = X? — 3’ = X(A-3). 


or £2 = —V2 T1. 


Hence, xı = (£1, £2) = (—1, V2) is an eigenvector belonging to A, = 0, 
and Eo = {tx,; : t € R}. (Here, one can take any nonzero solution (x1, £2) 
as an eigenvector x; belonging to A; = 0.) 

For Ay = 3, the system of equations (A2I — A)x = 0 becomes 


l Tti = V2 x9 z 0, 


= V2 T9. 
Jia, + Day P00, or 2 V2 x» 


Thus, x2 = (V2, 1) is an eigenvector belonging to à> = 3 and E; = 
{tx : t € R}. Note that the eigenvectors xı and xz belonging to the 
eigenvalues A; and A2 respectively are linearly independent. 


Example 6.2 [Real eigenvalues with multiplicities I] Find a basis for the 
eigenspaces of 
—2 


3 0 
A= | -2 0 
0 5 


© w 


Solution: The characteristic polynomial of A is (A — 1) (A — 5)?, so that the 
eigenvalues of A are Ay = 1 and Ag = 5 with multiplicity 2. Thus, there are 
two eigenspaces Æ; and E; of A. x € E if and only if 


—2 2 0 zy 0 
(I — A)x = 2 —2 0 zg | = | 0 
0 0 —4 T3 0 


Solving this system yields zı = t, x2 = t, 23 = 0 for t € R. Thus, the 
eigenvectors in Æ; are nonzero vectors of the form 


t 1 


6.1. | Diagonalization of matrices 229 


so that xı = (1, 1, 0) is a basis for the eigenspace E1. 
Now, x € E; if and only if 


2 2 0 Ly 0 
(5f-—A)x=]2 2 0 25.) [0 
0 0 0 T3 0 


Solving this system yields zı = —s, £2 = s, £3 = t for s, t € R. Thus, the 
eigenvectors in Eş are nonzero vectors of the form 


Mel ele Vere eWeek 
x= g= si+10 =s 1 i +Ł]| 0 


a os ie o} [1] 


for s, t € R. Since xg = (—1, 1, 0) and x3 = (0, 0, 1) are linearly 
independent, they form a basis for the eigenspace Es. 


Lemma 6.3 If A and B are square matrices similar to each other, then 
they have the same characteristic polynomial. 


Proof: Since there exists a nonsingular matrix Q such that B = Q-'AQ, 


det(AI— B) = det (Q7'(AI)Q - QAQ) 
= det (Q7'(AI — AJQ) 
= det Q`! det (AI — A) det Q 
= det(AI — A). 


Thus, similar matrices have the same eigenvalues, i.e., the eigenvalues are 
invariant under the similarity. But, their eigenvectors might be different: 
in fact, x is an eigenvector of B belonging to A if and only if Qx is an 
eigenvector of A belonging to A, since AQ = QB, and AQx = QBx = AQx. 


Definition 6.3 The set of all eigenvalues of a matrix A is called the spec- 
trum of A. 


Note that some of the eigenvalues, as the zeros of the characteristic 
polynomial, may be complex numbers, or may have multiplicities. 


Lemma 6.4 Let ào be an eigenvalue of A with multiplicity m. Then 


1 < dim(E),) < m. 


230 Chapter 6. Eigenvectors and Eigenvalues 


Proof: Let {v1,...,Vv,} be a basis for Fy, (4), and extend it to a basis 6 = 


{V1,..., Vk; Vk+1;---; Vn} for R”. Then, since vj,...,v,% are eigenvectors, 
_ | àk B 
[A], z | 0 C | : 
Since A = [A], and [A]g are similar, they have the same characteristic 


polynomial by Lemma 6.3: 


p(A) = det(AT—A) = (A= Ao) ACA) 
(A — Ao) Le —B 
0 AIn-k— C 
= det((A — Ao) J) det(AIn-k — C) 
= (A—Ao)*g(A). 
The equality in the second to the last is due to Exercise 4.21. Hence, 


(A—o)* is also a factor of p(A). Since m is the multiplicity of ào, h(Ao) # 0 
and so m is maximal such that (A — ào)” is a factor of p(A): i.e., k < m. 


= det(ÀI —[Als) = aet | 


Suppose that à1,..., Ag are all distinct eigenvalues of A with multiplici- 
ties m1, ..., Mg, respectively, so that n =m ,+---+m,. Then any eigenvec- 
tor x belongs to exactly one eigenspace F), (A) since Pa, (A) Nn FE), (A) = {0} 
for i # j. If there is any 7 such that dim(£),(A)) < mi, then A can not 
have n linearly independent eigenvectors, and so it can not be diagonaliz- 
able, since to diagonalize a matrix A, all we need is a basis consisting of n 
linearly independent eigenvectors. 


Theorem 6.5 A square matrix A is diagonalizable if and only if 
R” = E), (A) D: B Ey, (A), 
or equivalently, dim E); (A) = mj, for all j =1,...,k, where gp Ay are 


the distinct eigenvalues of A with multiplicities m1, ..., Mg, respectively. 


In Examples 6.1 and 6.2, the dimension of each eigenspace E) is equal 
to the multiplicity of the eigenvalue A. However, it may happen that dim F) 
is less than the multiplicity for an eigenvalue A. 


Example 6.3 [Real eigenvalues with multiplicities II] Consider the matrix 


1 2 
A= | 01 | . A simple computation shows that the characteristic poly- 


nomial of A is (A — 1)”, and so À = 1 is the eigenvalue of multiplicity 2 


6.2. Eigenvalues and eigenvectors 231 


with an eigenvector x = (1, 0). From the argument in the next section, 
one can show that it is the only linearly independent eigenvector, so that 
dim E = 1 < 2 = the multiplicity of A: that is, A is not diagonalizable. 


In Chapter 8, we will discuss about matrices that have fewer linearly 
independent eigenvectors than the multiplicities of the eigenvalues. 

On the other hand, if an eigenvalue of a matrix is a complex number as 
the following example shows, it may be not possible to find its eigenvectors 
unless the complex numbers are included in the scalars. That is, we need 
to expand the set of scalars to the set of complex numbers. This expansion 
of the set of scalars to the set of complex numbers leads us to work with 
complex vector spaces, which will be discussed in Chapter 7. 


Example 6.4 [Complex eigenvalues] The characteristic polynomial of the 
rotation matrix 


_ | cos@ —sin#@ 
~ | sind cos 6 


is \? —2 cos 0+ (cos? 6+ sin? 0). Thus, the eigenvalues are A; = cos 0+i sin 0, 
which are complex numbers, so this matrix as a rotation of R? has no real 
eigenvalues unless 0 = nr, n = 0, +1, +2,.... 


Problem 6.1 Let A be a 2 x 2 matrix whose characteristic polynomial is 
det(AI — A) = A? +bA +c. 


Show that b = —tr (A) and c = det A. 


Problem 6.2 Let A be an eigenvalue of A and x an eigenvector belonging to A. 
Use mathematical induction to show that A” is an eigenvalue of A™ and x is an 
eigenvector of A™ belonging to A™ for each m = 1, 2, .... 


6.2 Eigenvalues and eigenvectors 


In this section, we derive some of the basic properties of the eigenvalues and 
eigenvectors. 


Lemma 6.6 If A is a triangular matriz, then the diagonal entries are ex- 
actly the eigenvalues of A. 


232 Chapter 6. Eigenvectors and Eigenvalues 


Proof: The characteristic polynomial of an upper triangular matrix satisfies 
A= au e * 

det(A — A) = det ge : 

0 A — ann 


(A — a11) rains (A — ann) = 0. 


Theorem 6.7 Let A be ann xn matrix with n eigenvalues Aj, ..., An: 


Ante ee eee Oe 
(2) tr(A) = ai. + ee +an =AL+ «+s Àn. 


Proof: (1) Since eigenvalues 1, ..., An are zeros of the characteristic 
polynomial of A, we have, for any À € R, 
det(A — A) = (A — Aq) +++ (A= An). 
Thus, if we take A = 0 in both sides, then we get 
(—1)" det A = (—1)"Aq +> An. 
(2) By expanding both sides of the equation in (1) separately: 


n = GOL) 
AFRO Pe Abi 
det : 


api a A Sinn 


we get two polynomials of the same form p(À) = A” +e,_1A" 1 ++ -+61 À+c0 
in À. Thus by comparing the coefficient cn—1 of \”~!, we get Ay + ++» +An = 
aii F ++ + ann. 


Corollary 6.8 The determinant and the trace of A are invariant under the 
similarity. 


Therefore, det A = A; --- A, = 0 if and only if A; = 0 for some i. Thus, 
a square matrix A is singular if and only if zero is an eigenvalue of A, or 
A is invertible if and only if zero is not an eigenvalue of A. The following 
problem is an easy consequences of this fact. 


6.2. Eigenvalues and eigenvectors 233 


Problem 6.3 Prove that, for any n x n matrices A and B, AB and BA have the 
same eigenvalues. 


Problem 6.4 Find the matrices A and B such that det A = det B, tr(A) = tr(B), 
but A is not similar to B. 


Problem 6.5 Show that A and A’ have the same eigenvalues. Do they necessarily 
have the same eigenvectors? 


Problem 6.6 For any n x n matrices A and B, show that AB and BA are similar 
if A or B is nonsingular. 
Problem 6.7 Let A;, A2, ..., An be the eigenvalues of an n x n invertible matrix 


1 
A. Show that the inverse A~! has eigenvalues —, —, ..., —. 
Ar A2 Xn 
1 1 
0 1 
Its eigenvalue is Ay = Ao = 1 of multiplicity 2. If A is diagonalizable, then 


a4 =| 4 le 


Remark: Not all matrices are diagonalizable. For example, let A = 


for some invertible matrix Q, and then A must be the identity matrix. Since 
A is not the identity matrix, no invertible matrix Q can be achieved so that 
Q-|AQ is diagonal. Similar argument may applied to Example 6.3 to show 
the matrix there is not diagonalizable so that it does not have two linearly 
independent eigenvectors: i.e., dim F = 1. 


Example 6.5 Diagonalize the matrix 


1 -3 3 
A=|0 —5 6 
0 -3 4 


Solution: A direct calculation gives that the eigenvalues of A are A, = 1, 
A2 = 1 and A3 = —2, and their associated eigenvectors are 


xı = (1, 0, 0), x2 = (0, 1, 1) and x= (1, 2, 1), 


respectively. They are linearly independent, and the first two vectors X1, X2 
form a basis for the eigenspace E belonging to A; = Ag = 1, and x3 forms 
a basis for the eigenspace E_s belonging to àg = —2. Thus, the matrix 


1 0 1 
P= |0 1 2 
0 1 1 


234 Chapter 6. Eigenvectors and Eigenvalues 


diagonalizes A. In fact, one can verify that 


1 1 | 1 -3 a] 1 
PAP = | 0 -1 ‘al 0 —5 d 0 
0 


ea 1 0 d 
1 2 0 1 0 
0 1 —1 0 -3 4 1 1 0 0 


| -2 | 
Choosing different eigenvectors belonging to the eigenvalues 1 or —2 does 

not change the diagonal matrix. Any matrix whose columns are linearly in- 

dependent eigenvectors will diagonalize A. For example, {(—1, 0, 0), (0, —1, 


—1)} is another basis for F4, and {(2, 4, 2)} is also a basis for H_». Then 
for 


-1 02 10 0 
Q= 0-1 4/,Q'AQ=]0 1 0 
0 -1 2 0 0 -2 


Reordering the eigenvectors in constructing a transition matrix Q does not 
change the diagonalizability of A, but the eigenvalues appearing on the main 
diagonal of the resulting diagonal matrix would appear in accordance with 
the order of the eigenvectors in the transition matrix Q. For example, if 


110 1 0 
S=]|0 2 |S AS |. -2 0 
0 11 0 1 


Problem 6.8 Show that the following matrices are not diagonalizable. 


| A 1 0 | A 0 0 
QA=; 0 à 1], @B=j1 A 0 |, Ais any scalar. 
| 0 0 X | 0 1 xX 


Problem 6.9 Find a 2 x 2 matrix A whose eigenvalues are 2 and 3, and whose 
eigenvectors are (2, 1) and (3, 2), respectively. 


Theorem 6.1 shows how to diagonalize a matrix and what the diagonal 
matrix is when the matrix has a full set of linearly independent eigenvectors. 
The next question is when a square matrix A can have a full set of linearly 
independent eigenvectors. The following theorem shows that it happens if 
an n x n matrix has n distinct eigenvalues. 


Theorem 6.9 Let ài, A2, ..., Ax be distinct eigenvalues of a matriz A 
and Xi, X2, ..., Xp eigenvectors belonging to them, respectively. Then 
{X1, X2, ..., Xk} is linearly independent. 


6.2. Eigenvalues and eigenvectors 235 


Proof: Let r be the largest integer such that {x ,, ..., Xr} is linearly 
independent. If r = k, then there is nothing to prove. Suppose not, t.e., 
1<r<k. Then {xj, ..., Xr41} is linearly dependent. Thus, there exist 
scalars c1, C2, ..., Cr41 With c,4, Æ 0 such that 

C1X1 + CoXe +++: + p41 Xr41 = 0. (1) 


Multiplying both sides by A and using Ax; = ;x;, i =1,...,r +1, we get 


C1À1X1 + C2À2X2 + +++ + rp ArgiXrq1 = O. (2) 


Multiplying both sides of (1) by 4,41 and subtracting the resulting equation 
from (2) yields 


c (Ài — Ar41)X1 + €2(A2 — Ar41)X2 +++ + (Ar — Àr+1)Xr = 0. 


Since {x,, ..., Xp} is linearly independent and \j, ..., A;+1 are all distinct, 
Cc) =+: = Cr =0. From (1), cr41 = 0; a contradiction. 


Hence, if n eigenvalues of a square matrix A are all distinct, then their 
corresponding eigenvectors of A form a basis for R”. Thus, by Theorems 6.1, 
the matrix representation of A with respect to this basis should be a diagonal 
matrix. 


Theorem 6.10 [fn eigenvalues of ann xn matrix A are all distinct, then 
A is diagonalizable. 


Of course, the converse is not true: some matrix may have eigenvalues 
with multiplicities strictly greater than 1, and so has strictly less than n 
distinct eigenvalues. Some of such matrices may still have n linearly inde- 
pendent eigenvectors so that they are diagonalizable, because, for diagonaliz- 
ability, all we need is n linearly independent eigenvectors (see Example 6.5). 
But, some of such matrices may not have n linearly independent eigenvec- 
tors (see Example 6.3), so that their diagonalization are not possible. These 
cases will be discussed in Section 8.1. 

The next example shows a simple application of the diagonalization to 
the computation of the power A” of a matrix A. 


Example 6.6 Compute A!” for A = | i | 


236 Chapter 6. Eigenvectors and Eigenvalues 


Solution: Its eigenvalues are 5 and —2 with associated eigenvectors (1, 1) 
1 


and (—4,3), respectively. Hence Q = | 1 


= | diagonalizes A, i.e., 


49 =| 5 >| or A=9| 5 Pca 


Therefore, 


100 
A = Q | a (210 | Q! 


1 3. 5100 1 4.2100 4.5100 — 4.2100 
Po a l 


| 5 —4 4 
Problem 6.10 For the matrix A= | 12 —11 12], 
PeT | 
(1) diagonalize the matrix A, and 
(2) find the eigenvalues of A10 + A’ + 5A. 


6.3 Applications 


6.3.1 Linear difference equations I 


The discrete analog of a differential equation is called a difference equation. 
Lots of scientific problems related to a differential equation can be analyzed 
in a discrete data with a computer computing. Let us begin with a classical 
example. Early in the thirteenth century, Fibonacci posed the following 
problem: “Suppose that a newly born pair of rabbits produces no offspring 
during the first month of their lives, but each pair gives birth to a new pair 
once a month from the second month onward. Starting with one (= z1) 
newly born pair in the first month, how many pairs of rabbits can be bred 
in a given time, assuming no rabbit dies?” 

Initially, there is one pair. After one month there is still one pair, but 
two months later it gives a birth, so there are two pairs. If at the end of n 
months there are £n pairs, then after n+1 months the number will be the £n 
pairs plus the x,_, pairs of offsprings bred by the «,_1 pairs of populations 
alive at n — 1 months. Therefore, we have a recursive relation of x,,’s, for 
n > 2, 

Ln41 = Tn + Ly-1. 


6.3. Difference equations 237 


It is convenient to set zo = 0 and zı = 1. Then, by recursive substitutions, 
the first several terms of the sequence become 


0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, .... 


This sequence is called the Fibonacci sequence. 

Basically, for given initial values xp and g1, the terms £n, for all n > 2, 
are uniquely determined by the given recursive relation step by step: that 
is, the value z, can be computed by computing all the previous terms x for 
k < n — 1 step by step. Hence, for different initial values, the computation 
has to be repeated all over again. However, by using a matrix equation, the 
value of x, can be expressed in terms of the initial values so that for a given 
initial value it can be computed directly without computing the previous 
terms as the next example shows. 


Example 6.7 Find the 2000 Fibonacci number. 


Solution: A standard trick is to consider a trivial extra equation £n = £n 
together with the given equation: 


Intl = Ln + Ln-1 
Ln Sige: 


or in matrix notation this becomes: 


If we set Xn = | is | and A= | 
get 


|: then for xg = | i | we simply 


Xn = ÅXn—1 = A"xXp, n=1, 2,.... 


Thus, the problem is reduced to computing A”. However, a simple 
computation gives the eigenvalues A; = $(1 + V5), A2 = $(1 — v5) of A 
and their associated eigenvectors vı = (Ai, 1), v2 = (Ae, 1), respectively. 


Thus A is similar to D = | à 0 


E | by the transition matrix Q = [v1 v2] = 
2 


Ay A2 
1 


| , so that A = QDQ™! and 


238 Chapter 6. Ejigenvectors and Eigenvalues 
[e] =x = ata = ODO 


E an TE a Ee 
v5ļı 1 0 àz eee JLO 
1 AEP Aa 
| AP AB | 


Therefore, the Fibonacci numbers satisfy 


=o (4) -(55 


for n > 0. In particular, for n = 2000, 


2000 2000 
Sit acl 1+ v5 
2000 J 2 


Note that since £2000 must be an integer and << 


5) < 5 is very small 


for large n, £n will be the nearest integer to z 
14+V5 
2 


= ey" Historically, the 
number i 
mean. 


, which is very close to the ratio eat, is called the golden 


In general, a sequence of numbers may be defined recursively as follows: 
Given a finite set of specific initial numbers, the rest numbers of the sequence 
are defined by some recurrence relation (or, recursive formula) in terms 
of the numbers already defined. 


Definition 6.4 Among the number 2,,’s of a sequence {£n : n > 0}, a 
recurrence relation (or, a difference equation) of the form 


Ly = A En, + A2%n_g +--+ akln-k for alln > k, 


where a;, t = 1,...,k are some constants with a; and a, nonzero, is called 
a linear difference equation of order k. 


For a set of specific initial k values xo, ..., &k—1, called an initial con- 
dition, a recurrence relation defines a unique sequence {£n : n > 0} of 
numbers, which is called a solution: That is, any sequence {£n : n > 0} 


6.3. Difference equations 239 


of numbers that satisfies the difference equation is called a solution of the 
difference equation. For example, the equation £n = a£n—1 for n = 1,2,... 
gives a geometric sequence £n = a” xg, and the equation fn41 = Zp + Ln_1 
with zo = 0,2, = 1 for n = 1,2,... gives the Fibonacci sequence. 

Just like the Fibonacci sequence, the computation of £n could be easier if 
Tn is expressed in terms of the first k initial values, as shown in Example 6.7. 
This can be achieved by rewriting the linear difference equation in vector 
form via a system of linear equations: consider a system of linear equations 
consisting of the linear difference equation and some trivial equations: 


Entk-1 = QlTn+k—2 + @2Tn+k—3 + +++ 1 ARLy-1 
Intk-2 = %Ln+k—-2 
In = Tn, 


for n > 1. This can be written in matrix equation: 


a, Q2 Q3 **: Qk—1 Qk 
Tn+k—1 te 00o 0 0 In+tk—2 
-ÍO 1 0 0 0 : 
| i op oi Dt | a 
mi O° 0 0- 1 0 mi 
y a, Q2 Q3 *** @k—1 Qk 
sea 10 0+. 0 0 
If we set xn = and A=] 9 1 0 0 0 |. then 
Lnt+1 : x : ; : 
pe 0 0 0>- 1 0 


the system of linear equations becomes simply, for n > 1, 
Xn = AXn-1, 


which is a difference relation of vectors in R* just like the geometric sequence 
Ln = a@Zn—1 with the solution £n = a”xo. Note that each component zg of 
the vector Xn, n > 0, is just k-th term in the sequence {£n} defined by 
the original linear difference (or, recurrence) equation. Thus the solution 
sequence of numbers defined by the given linear difference equation and the 
solution sequence of the vectors are essentially the same. Therefore, by 
abuse of language, we do not distinguish the solution sequence of numbers 
and that of vectors for a linear difference equation. 


240 Chapter 6. Eigenvectors and Eigenvalues 


The k x k matrix A is called the companion matrix of the linear 

difference equation. By recursive substitutions, we get: 
Xn = A”xo. 

This means that x,, is expressed in terms of A” and the initial condition xg = 
[Zena xo]? € Rf. Thus, given an initial vector x9 € R’, the solution 
sequence x, is uniquely computed from x, = Ax,_; = A”xo. Hence, the 
computation of x, is reduced to that of A” of the companion matrix A, 
which is simple if A is diagonalizable. The diagonalizability of A can be 
easily determined since the eigenvalues and a set of linearly independent 
eigenvectors of A satisfy the following two lemmas. 


Lemma 6.11 Let A be the companion matrix of a linear difference equation 
Ln = A1Lyn-1 + A2%n_g +--+ +ap%y_~ for alln > k, say 


a, Q2 Q3 ++: @k—1 Qk 

1 0 0 ¿e 0 0 
A=|0 1 0 0 0 with a, and a, nonzero. 

| 0 0 0 >- 1 0 


(1) The characteristic polynomial of A is 
det l= A) = p(à) = àË — a yr SS ap À — ak. 
In particular, all eigenvalues of A are nonzero. 
(2) If `o is an eigenvalue of A, then {£n = AH : n > O} is the so- 
a of és ote relation Xn = AXn—ı for the initial values 
0 eran A de 


Proof: (1) Use induction on k. Clearly true for k = 1. Assume the equality 
for k — 1. For k, by taking the cofactor expansion of det(AJ — A) along the 
last column, the induction hypothesis gives 
det(AI — A) NOR Sap es = apa + (11) ay 
es aoe — ++) — ap_jA — ag. 

The second statement is clear since p(0) = ap Æ 0. 

(2) If ào is an eigenvalue of A: i.e., AK = ar! +---+ak—-1A0 tag, by 
multiplying as to this equation one gets 


=l —k+1 —k 
NO = aà tee agag “t+ aK rg, 


which means {Aj : n > 0} is a solution. 


6.3. Difference equations 241 


Remark: (1) Lemma 6.11 (1) also shows that every monic polynomial, a 
polynomial whose coefficient of the highest degree is 1, is the characteristic 
polynomial of some matrix. The matrix A is also called the companion 
matrix of the monic polynomial p(A) = A* — a,A*—! — --- — ak-1À — ag. 

(2) Notice that, if we rewrite a linear difference equation as 


Ln — A1Ln—1 — 42Ln—2 — +++ — ApEn_~ = O, 


then the characteristic polynomial of the companion matrix A is just the 
left side with x;’s are replaced by ’’s and divided by the factor \”~*. This 
relation between the linear difference equation and the characteristic poly- 
nomial of the companion matrix is a reason why {Aj : n > 0} is a solution 
when Ao is an eigenvalue. 


Lemma 6.12 If Ao is an eigenvalue of the companion matrix A of a linear 
difference equation of order k, then the eigenspace E(Ag) is 1-dimensional 
subspace spanned by an eigenvector of the form vo = ne < dg I’. 


Proof: By solving (A — Ao/)x = 0 directly, one can easily find that x = 
[v7.1 ++ Zo]? is an eigenvector of A if and only if z; = Aozi-1 = Ajxo for 
i= 1,..., k— 1, and zo £0. 


In particular, if the k eigenvalues 41, ..., Ax are all distinct from each 
other, then their k eigenvectors v; = pe --» À; 1)" are linearly indepen- 


dent and the transition matrix 


KES? we ES! 
k 
Q ariel A ae Vk} = ’ 
| | ài nae 
1 Moe ik 


which is a Vandermonde matrix, diagonalizes the companion matrix A: 


Àl 0 
QAQ = n =D. 
0 Ak 


Hence, the particular solution of a linear difference equation Xn = AXn—1 


242 Chapter 6. Eigenvectors and Eigenvalues 


with initial condition xg can be written as 
x, = Axo = QD"Q Ixo 
| | | | | At 0 Cy 
= Vi V2 © Vp er : 
a die > cage ge 
= CATV + CAR Ve +++ FE CKARVE: 


where c = Q7!xo = [cy c2 «++ cy”, for all n > 1. In expansion, 
+k-1 +k-1 
In+k—1 At AL 
=¢ | ° +e tie |: 
+1 +1 
Ln+1 vt ae 
n n 
Di \! Ap 


Theorem 6.13 For a linear difference equation 
Ln = A,{Ln-1 + Geng +--+ +Gpen_-~, for n > k, 


with nonzero a, and ag, if the companion matriz A has k distinct eigenvalues 
At, -++) Ak, then a general solution {£n : n > 0} is given as 


Tn = 60] } C2À3 pea St Ck Àk, 
where ci, ..., Ck are arbitrary constants. Moreover, if an initial condition 
xo = [zk --. Zo]? is given, the constants ¢ = (c1,...,ck) are determined 
as the solution of Qc = xo for Q = [v1 vo -+> vkl. 


In general, given a linear difference equation Xn = AXn—1, there may be 
many sequences of vectors in R that satisfy the equation, called solutions, 
and, in fact, the set S of all possible solutions of a linear difference equation 
forms a vector space (see Example 2.1 (4)). 


Theorem 6.14 The set S of solutions, called the solution space, of a linear 
difference equation Xn = AXn—ı of order k is isomorphic to R*. 


Proof: Note that a solution sequence {x,, : n > 0} of vectors is determined 
uniquely by the initial condition x9 = [zg_1 < zo] in Rë. This means 
that the transformation T : S > R! by T({x,}) = xo, which is clearly 
linear, is an isomorphism. 


Therefore, any solution of a linear difference equation may be written as 
a linear combination of a basis for S. 


6.3. Difference equations 243 


Definition 6.5 A basis for the solution space S is called a fundamental 
set of solutions, and a solution which is written as a linear combination of 
a basis for S with arbitrary coefficients is called a general solution of the 
matrix equation. The solution determined by a particular initial condition 
Xo = [£k] «++ z1 Zo] is called the particular solution. 


In the Theorem 6.13, the solutions {A” : n > 0}, i = 1,...,k, are the 
ones determined by the initial conditions x9 = Qe; = v;, for i = 1, 2, ..., k, 
and so they form a fundamental set of solutions. 


Example 6.8 Solve the linear difference equation 


Ln = 6Fn—-1 — llzn—2 + 62,3 for n> 3 


with initial values zo = 0, zı = 1, z2 = —1. 


Solution: In matrix form, it is 


En+2 | 6 -ll 6 | In+1 
oS | eae Hel 0 0 Tn = Ax,_; forn>1. 
| Ta | 0 1 0 | tai 


By Lemma 6.11, the characteristic polynomial of A is 


det(AI — A) = 4° — 6)? + 11A — 6 = (A—1)(A = 2)(A — 3). 


Thus the eigenvalues are ày = 1, A» = 2, A3 = 3 and their associated 


eigenvectors are 
| j | l | i l 
1 2 3 


SARAN 


respectively. The transition matrix and its inverse can be found to be 


149 Fi 1-5 6 
a 2 3l, e 8 —6 
1 1 1 1 -3 2 


244 Chapter 6. Eigenvectors and Eigenvalues 


one can get 


xn = A"™xy = QD”Q Ixo 
| 14 9 | 1” 0 0 | -3 
= 1 2 3 0 2” 0 5 
| 1 1 1 | 0 0 3” | —2 
| En42 | —3 +5x2t2? De Bere 
or, | @yat = —3 Eh eer —2x3rtl 
| Ln | 3 +5 x 2” —2~x 3” 


Thus, the solution is {£n = —3 +5 x 2” — 2 x 3": n> 0O}. 


Example 6.9 Solve the linear difference equation 


Ln = In-1 + TEn-2 — En-3 —6%n_-4 forn> 4 


with initial values x9 = 0, zı = 1, 2 = —1, x3 =2. 


Solution: The characteristic polynomial of the companion matrix 


oron 


—1 
0 
0 
1 


oor Fe 


is, by Lemma 6.11, 


det(AI — A) = A* — A? — 7A? +A 46 = (A-1)(A + 1)(A + 2)(A — 3), 


so that A has four distinct eigenvalues A; = 1, Ag = —1, A3 = —2, Aq = 3. 
Hence, the geometric series {1"}, {(—1)"}, {(—2)"}, {3”} form a funda- 
mental set of solutions and so a general solution is, by Theorem 6.13, 


{£n = c11” + c2(—1)” + c3(—2)” + c43” : n > 0}. 


Now [c1 c2 c3 c4]? is the solution of 


=i =§-27 C1 

1 1 4 9 c2 =i 
i 22) Ey c3 Ce 
l 1 1 1 c4 


6.3. Difference equations 245 


which is a system of linear equations with a 4 x 4 Vandermonde matrix as 
its coefficient matrix. The solution can be easily found to be cı = 5, C2 = 


=}, C3 = —a, C4 = -b and so its particular solution is 
5 1 4 1 

— yj" =(_-1)” —(—92)" — 3r 
pap gap ag 


Problem 6.11 Let {an} be a sequence with ag 1, ay 2, a2 0, and the 
recurrence relation an = 2an—1 + an—2 — 2an-3 for n > 3. Find the n-th term an. 


Lemma 6.12 suggests that the companion matrix is not diagonalizable 
if the matrix has an eigenvalue with multiplicity greater than 1. The next 
example illustrates what the solution looks like in this case. 


Example 6.10 (Recurrence relation with a repeated eigenvalue) Solve the 
linear difference equation 


Ln = —22n-1 — Tn—2 for n> 2 
with initial values x9 = 1, zı = 2. 
Solution: The characteristic polynomial is 
M4 2A41=(A4+1)’, 


and so A = —1 is an eigenvalue of multiplicity 2. Hence, it has only one 
eigenvector v = [-1 1]’, and the geometric series {xn} = {(—1)"} is a 
solution of the difference equation. Since the solution space is of dimension 2 
by Theorem 6.14, we need one more solution which is independent of {£n} = 
{(—1)”"}. But, in this case one can easily verify that {£n} = {n(—1)"} is 
another solution of the difference equation. In fact, 


=2¢n_1 — fn_2 = —2(n — 1)(—1)""! — (n — 2)(-1)"" 7 = n(-1)" = an. 


Clearly, two solutions {(—1)”} and {n(—1)”} are linearly independent, and 
SO £n = C1(—1)"+cgn(—1)” is a general solution. The initial condition gives 
cy = leg = —3 and £n = (—1)" + —3n(—1)” is the particular solution of 
the difference equation. 


In general, if the companion matrix A of a linear difference equation 


Ln = Q1Tn—1 ++++ + akTn-k; n> k, 


246 Chapter 6. Eigenvectors and Eigenvalues 


has an eigenvalue A of multiplicity m > 1, then 
Gee NN. cst TA 


are m linearly independent solutions of the difference equation. The follow- 
ing problem asks to verify this fact which can be done by a direct computa- 
tion. 


Problem 6.12 Verify that if A is an eigenvalue of a linear difference equation with 
multiplicity m > 1, then £n = A", nA”, ..., n™T1A” are m linearly independent 
solutions of the difference equation. 


Problem 6.13 Solve the linear difference equation £n = 3%,_,1—4@n_3 for n > 3. 
What is it if wo = 1,21 = T2 = 1? 


In Chapter 8, we will show how to, in general, find a set of linearly 
independent solutions when the companion matrix is not diagonalizable (see 
Section 8.3.4). 


6.3.2 Discrete dynamical systems I 


Motivated from a linear difference equation, one may consider a recursive 
matrix equation 


Xp Sj Ax, TA on S 2 


for any k x k matrix A and xy € R*. This relation also represents a mathe- 
matical model of dynamical processes that change over time, and the vectors 
x, € Rf contain most information of the dynamical process as the time n 
passes. In this sense, the sequence {x,,}?°., is also called a discrete dy- 
namical system. Such a system appears widely in such areas as economics, 
electrical engineering, and ecology, etc. 

The sequence {xn}? o of vectors x, = Ax,_1 is obtained simply by a 
computation of A”, n = 1, 2, ..., even if the computation of A” might not 
be easy. We are mainly concerned about the limit 


lim xn = lim (A”xo) = (lim A”)xo. 
noo n— o0 n—> o0 


The second equality is based on the following definition and theorem: 


6.3. Discrete dynamical systems 247 


Definition 6.6 A sequence of matrices A1, Ao, A3, ... of the same size is 
said to converge to the matrix L, which is called the limit of the sequence 
and denoted by L = lim Ap, if, for all i, 7, 

n—> o0 


lim [An]ij = [L]j;. 


n—> o0 


Theorem 6.15 Let Aı, Ao, A3, ... be a sequence of k x m matrices such 
that lim An = L. Then 
n— oo 


lim BAn = BL and lim ApC = LC 
n— o0 N00 
for any matrices B and C for which the products are defined. 


Proof: By comparing the (i, 7)-entries of both sides 


k k 
Jim [BAn]ij = lim (Diska) = 3 [Bly Jim [An]ej 
£=1, é=1 
k 
=o [Bliel£le = [BL], 
£m 


we get lim BA, = BL. Similarly lim A,C = LC. 
n—> oo n—> oo 


In general, a discrete dynamical system {x, = Axo : n = 1,2,...} is 
said to be 


(1) stable if lim x,, exists, 
N— Co 
(2) unstable (or divergent) if lim x, diverges, 
n—> o0 
(3) neutrally stable if eventually the set {xn : n > N} for some integer 


N, is contained in a bounded set. 


For example, ifa k x k matrix A is diagonalizable with k linearly inde- 
pendent eigenvectors V1, ..., vz belonging to the eigenvalues A1, ..., Àk, 
respectively, then A = QDQ™! for Q = [vı --- vp], and, for each integer 
n > 1, A” = QD”Q~t!. Thus, as Theorem 6.6 shows, the vector xn = A” Xo 
can be written as 


Xn = aA} vi F a2X5V2 ate te OKARVE; 


248 Chapter 6. Eigenvectors and Eigenvalues 


where a1, a2, ..., Qk, are some constants depending on the initial vector 
xo. Moreover, 


lim AT 0 
n—> o0 
lim A” = Q( lim D”)Q H =Q e Q7. 
n—> o0 n—> o0 
0 lim Aj; 


N— Co 
Thus, lim A”, and so lim xn, exist if and only if lim Ap exist for all 
noo n—0o noo 


j=1,...,k. 


Theorem 6.16 Let A be a diagonalizable matriz. Then lim A” exists if 
N— Co 


and only if each eigenvalues À of A satisfies either |A| <1 or A=1. 


1 ifA=1, 
0 if [Al <1. 


Proof: |A| < 1 or à= 1 if and only if lim à” = i 
n—> oo 


Hence, a dynamical system with a diagonalizable matrix A is stable if and 
only if every eigenvalue à of A satisfies |A| < 1 or \ = 1. On the other hand, 
if there exists an eigenvalue A with |A| > 1, then the magnitude of the vectors 
Xp May grow exponentially, so that the system is unstable. However, there 
is not much to say if there is an eigenvalue À such that |A| = 1, but A #1. 
In general, if all the eigenvalues À of A satisfy |A| < 1, then {xn : n > 0} 
is bounded, and the system is neutrally stable. Therefore, the stability of a 
dynamical system depends on the magnitudes of the eigenvalues. Especially, 
if the eigenvalue \ is a complex number of modulus 1: t.e., [A| = 1, and 
A Æ 1, then the system is neutral in the sense that ”v stays on the circle of 
radius ||v||. The following theorem illustrates stable and unstable systems. 


Example 6.11 Solve a discrete dynamical system Xn = Axy,_1,n = 1,2,---, 


where 
0.8 0.0 1.2 0.0 
ane he cel OA Fe va 


Solution: (1) Clearly, the eigenvalues of A are 0.8 and 0.5 with eigenvectors 
vı = l and vo = 9 , respectively. If xg = Ei , then 
0 1 C2 


T | 7 | +¢9(0.5)" | : | Z | o |. 


6.3. Discrete dynamical systems 249 


Figure 6.1: A stable dynamical system 


It concludes that the system x, = Ax,—1 is stable. (See Figure 6.1) 
(2) Similarly, one can show that 


Xn = ¢(1.2)" | ; | + c9(0.6)" | A | , 


This system is unstable. (See Figure 6.2) 


Figure 6.2: An unstable dynamical system 


Example 6.12 Solve a discrete dynamical system Xn = AXn—1; Nn = 1,2,---, 
where 
1/3 1 
AS | 1 3 | 
Solution: The eigenvalues of A are 1 and 2 with eigenvectors vı = 


1 


—1 
1 : . 1 1 1 0 

and vz = 1 , respectively. With Q = | ea | and D= | 02 |! a 

general solution of Xn = AXn—1 is 


250 Chapter 6. Eigenvectors and Eigenvalues 
xn = QD”Q txo = 1 1"v1 + c22”və 
In n 1 n 1 Ci T C22” 
= 1 2 = 
be) Se R eae | 


1 -1 
1 | =Q-!xp = 5 i 1 | xo. By eliminating c22” in the equa- 


where | 
C2 
tions, we can see that the solutions xn stay on the line y = x—2c; for any con- 


stant cy. For example, i xo = | ‘Al k | =| 3 | and sox, =| ‘| 
= 5 = 


for all n. On the other hand, if xo = | A l tén | - | = | 1 | o 
2 


a[i] a[i] G] [8] 


Thus the system x,, = A”xg is unstable. 


Example 6.13 (Markov process) Suppose that the population of a certain 
metropolitan area remains constant but there is a continual movement of 
people between the city and the suburbs. Suppose that x9 people of the 
metropolitan area live in the suburbs and yọ people live in the city at the 
beginning, so that zo + yo = a is fixed. Suppose also that each year 20% of 
the people outside the city move in, and 10% of the people inside move out. 
What is the ‘eventual’ distribution of the population? 


Solution: At the end of the first year, the distribution of the population 
(x1, y1) will be 


Ly = 0.8 LO T 0.1 Yo 
yi = 0.2 LO T 0.9 Yo. 


Or, in matrix form, 


E T1 — 0.8 0.1 ZO = Age 
1f y | | 0.2 0.9 o A 
Thus if xn = (£n, Yn) denotes the distribution of the population after n 
years in the metropolitan, we will get xn = A”xXo. 
To find the distribution of the population after n years, it is good enough 


to compute A”. The eigenvalues and eigenvectors of A are à = 1, Ag = 0.7 
and vı = (1, 2), v2 = (—1, 1), respectively, so that 


a=[} i] on ered i] 


6.3. Discrete dynamical systems 251 


Hence, Q-'AQ = | a | = D and 


Xn = A”x9 = QD"Q ‘xo =cq1"v, + €2(0.7)" v2 
= zo , Yo 1 —2x9 Yo n| —1 
aa Dhl F +B) on] dl 
zo , Yo 1; a/1 
($+ 8) [3 ])=$la]= n — o. 


Note that, since £n + Yn = £o + Yn = a is fixed for all n, the process 
in time remain on the straight line x + y = a, or y = —x +a. Since 


1 f pgs ‘ . 
| a | ae | 9 |. for a given initial total population a, the eventual ratio 
n 
of non-citizen and citizen tends to 1 : 2 which is independent of the initial 
distribution. 
For particular populations of a = 3, 6, 9 million people, the processes 


are shown in Figure 6.3. 


Figure 6.3: A Markov process A”xo 


This example is concerned with predicting the state of an object that is 
constrained to be in exactly one of several possible states at a given time but 
that changes states in some random manner. This kind of process is called 
a stochastic process. The objects in the problem are the people in the 
metropolitan and the state of each of them is living either in the city or in 
the suburbs. Note that the matrix A in Example 6.13 satisfies the following 
two conditions: 


(1) The entries of A are all nonnegative because the entries of each column 
of A represent the probabilities of residing in one of the two locations 
in the next year, 


252 Chapter 6. Eigenvectors and Eigenvalues 


(2) the entries of each column of A add up to 1 because the total popula- 
tion of the metropolitan remains constant. 


Any matrix A with these two properties is called a stochastic matrix 
(or, a transition matrix). This kind of matrices have quite distinctive 
properties. In the following and in Section 8.3, we will exploit some inter- 
esting properties. 

In general, for an arbitrary n x n stochastic matrix A, the rows and 
columns correspond to n states, and [A];; represents the probability of mov- 
ing from state j into state i in one stage (or year). In the example, there 
are two states (being in the city and being in the suburbs), and [A]21, for 
example, represents the probability of moving from the city to the suburbs 
in one stage. 

The state of an object in one state may change to a different state in the 
next stage depending on either 

(1) the state in question, 

(2) the time in question, 

(3) some or all of the previous states in which the object has been, or 
(4) the state that the other objects are in or have been in. 


However, like the example, if the probability that an object in one state 
changes to a different state depends only on the two states (and not on the 
time, earlier states, or other factors), then the stochastic process is called 
a Markov process. If, in addition, the number of possible states is finite, 
then the Markov process is called a Markov chain. 


Note that the components in the state vectors = | P | satisfy 
n 


n/a > 0, yn/a > 0 and £n/a+yn/a = 1, and they represents the percentiles 
of the objects in the given two states at the stage n > 0. Such a vector is 
called a probability vector. In this sense, each column of a stochastic 
matrix A is also a probability vector. 


Lemma 6.17 (1) The product of two stochastic matrices is a stochastic 
matriz. In particular, any power of a stochastic matrix is a stochastic 
matriz. 


(2) The product of a stochastic matrix and a probability vector is a proba- 
bility matriz. 


Proof: Let u = [1 1 --- 1]? € Rt. Then it is quite easy to see that A is a 
stochastic matrix if and only if ATu = u, and x is a probability matrix if 
and only if u’x = 1. 


6.3. Discrete dynamical systems 253 


Let A and B be two stochastic matrices, and x a probability matrix. 
Then 
(AB)? u = BY (ATu) = Bu = u, 
u? (Ax) = (uT A)x = (A?u)?x = u’x = 1. 


Note that A and A’ have the same set of eigenvalues by Problem 6.5. 
Note that A is a stochastic matrix if and only if ATu = u where u = 
[1 1 ---1]” implies that 1 is an eigenvalue of A’, and also of A. Hence, 
there is an eigenvector x of A belonging to à = 1. In Section 8.3.5, we will 
show that we may assume that x has only nonnegative components. Then 
some constant multiple of this vector is a probability vector as well as an 
eigenvector of A. Thus Ax = x means the Markov chain is completely static 
since A”x = x for all n. Such an eigenvector x of A is called a steady state. 


Theorem 6.18 If A is a stochastic matriz, then 
(1) A=1 is an eigenvalue of A, 
(2) there exists a steady state x that remains fixed by the Markov process. 


We will continue the discussion more about interesting properties of 
stochastic matrices in Section 8.3.5. 


Problem 6.14 Suppose that a land use in a city in 1990 is 


Residential zo = 30% 
Commercial yo = 20% 
Industrial zo = 50%. 


Denote by £k, Yk, Zk the percentage of residential, commercial, and industrial, 
respectively, after k years, and assume that the stochastic matrix is given as follows: 


Lk+1 0.8 0.1 0.0 Lk 
Yrer | = | 0.1 0.7 0.1 Yk 
Žk+1 0.1 0.2 0.9 Zk 


Find the land use in the city after 50 years. This problem has two essential prop- 
erties of Markov process: The total area of the city stays fixed, and each portion 
of area can never become negative. 


Problem 6.15 A car rental company has three branch offices in different cities. 
When a car is rented at one of the offices, it may be returned to any of three 
offices. This company started business with 900 cars, and initially an equal number 
of cars was distributed to each office. When the week-by-week distribution of cars 
is governed by a stochastic matrix 


0.6 0.1 0.2 
A=| 02 02 02 |, 
0.2 0.7 0.6 


determine the number of cars at each office in the k-th week. Also, find jim AF. 
00 


254 Chapter 6. Eigenvectors and Eigenvalues 


6.3.3 Linear differential equations I 


For those who are not familiar with differential equations, we begin this 
section with some basic preliminaries about them. 

Let y = f(t) be a real-valued differentiable function on an interval I = 
(a, b) containing 0. From elementary calculus, it is easy to see that the 
differential equation y'(t) = ae) = 5y(t) has a general solution y = ce™, 
where c is an arbitrary constant. If we are given an additional condition 
f(0) = yo = 3, called an initial condition, then the solution of y' = 5y is 
y = 3e% , called the particular solution. 

This can be extended to a system of n linear differential equations with 


constant coefficients, which is by definition of the form 


Yi = ayyit aye +--+ + GinYn 

Yo = ANY + ao2y2 +--+ + G2nYn 

Yn = Amy + an2Y2 F `+: F AnnYn, 
where y; = y(t), for i = 1, 2, ..., n, are real-valued differentiable functions 
on an interval I = (a, b) and y; = wa is its derivative. In most cases, one 


may assume that the interval J contains 0, and some initial conditions are 
given as y;(0) =d; at OET. 

Let y = [y1 yo «++ Yn)’ be a vector valued function whose components 
are differentiable functions on an interval J = (a, b). Then, for each t € 
I = (a, b), its value y(t) = [y1 (t) yo(t) -+ yn(t)]" is a vector in R”, and its 
derivative is defined as 


yi y(t) 
i 1 
y ya (t) 
E E 7, for t € I. 
Yh Yn (t) 


Now if we denote the coefficient matrix of the system by A, then the system 
may be written, in the matrix form, as 


y = Ay on I, or y'(t) = Ay(t) for al t € J, 


with an initial condition yo = y(0) = (di, ..., dn) ER". 

A differentiable vector function y(t) is called a solution of the system 
y(t) = Ay(t) if it satisfies the equation. In general, the entries of the 
coefficient matrix A could be functions. However, in this book, we restrict 
our attention to the systems with constant coefficients. 


6.3. Application: Differential equations 255 


Example 6.14 Consider the following three systems: 


y = 2y — 3y2 E = ty + ty E = wy — 343 
2y ye ly = yty’ y y? + 5y2 


< 
wo 
| 


The first two systems are linear, but the coefficients of the second are func- 
tions. The third one is not linear. 


For a system y'(t) = Ay(t) of linear differential equations defined on 
I = (a, b), the fundamental theorem of ordinary differential equations says 
that, given an initial condition yọ € R”, the system has a unique solution 
y(t) = [v1 (t) yo(t) -+ yn(t)]* on I such that y(0) = yo. 

Note that, geometrically, a solution y(t) describes a smooth curve in R” 
passing through the initial vector yo = y (0) = (dı, ..., dn) as t varies in 
the interval J, and its derivative y'(t) describes the tangent vector to the 
curve for tE I. 


Lemma 6.19 Let S denote the set of all solutions of the system y'(t) = 
Ay(t). Then S is a vector space. 


Proof: If yi, y2 are solutions of the system, then 


(ayı + c2y2) = cry, + cays = c Ayı + c2Ay2 = A(cryi + c2y2). 


Thus, c1y1 + c2y2 is also a solution for any constants c;’s. 


The linear dependency of a set of n vectors {y1, ..., Yn} of vector valued 
functions on J is defined as usual: that is, they are linearly dependent on the 
interval J, if there exists a set of constants a1, ag, ..., an, not all of them are 


zero, such that ayy, + +++ + Gnyn = 0: i.e., ayyi(t) + +++ + anyn(t) = 0, 
for allt € I. Otherwise, they are said to be linearly independent. 

Note that, for a set of linearly independent n vector valued functions 
{¥1, ---, Yn} on I, there may be some point t in J where their values 
yi(t), ..., yn(t) as vectors in R” are linearly independent, and some other 
point s in J where their values may be linear dependent (see Example 2.25). 

However, Theorem 6.21 below shows that, if {y1, ..., yn} is a set of n 
solutions of a system y’(t) = Ay(t) of linear differential equations defined 
on I = (a, b), then the linearly independence of y;(t), ..., yn(t) at some 
point t € I guarantees their linearly independence at every point t € I. 


256 Chapter 6. Eigenvectors and Eigenvalues 


To see this, let us define a matrix function 


yilt) yilt) +++ Yin(t) 
Oya age) Oe a a 
MORTO 
from a set y1, .--, Yn of n solutions of y'(t) = Ay(t). The determinant 


function W (t) = det Y (t), for t € I, of Y (t) is called the Wronskian of the 
solutions. 


Lemma 6.20 The Wronskian W (t) is an exponential function of the form 
W (t) = Woet ®t with an initial condition W (0) = Wo. 


Proof: It is good enough to show that W satisfies W’ (t) = tr(A)W (t): 
W'(t) z (det Y (t))’ = `> sgn(7)(Y16(1) ms Uae): 


TESn 
=e aloa A Disia ee weal o)Yio(1) °° Ynotn) 
= utat et SU a lie) to Ona ++ + YnnYnn) 
sare = 5 Sa Yah = g adi V 1a 
tr(Y’- adj Y) = tr(A- (Y -adj Y)) 
= tr(detY(t)A) = tr(A)W (t), 


where Y;;(t) is the cofactor of y;;, and the equalities in the last two lines are 
due to the fact that 


YOA = [yi@) + yn@] = Alyi@) --- yn] = AY), 
Y(t) adjY(t) = detY(t)In =W(t)In. 


Note that yi(to), ---, Yn(to) are linearly independent at some to € J if 
and only if W (to) # 0. Since "®t > 0 for all t, W(t) 4 0 for allt € I 
if Wo 4 0, or W(t) = 0 for all t € I if Wọ = 0. This proves the following 
theorem: 


Theorem 6.21 Let {y1, ..., yn} be a set of n solutions of a system y' = 
Ay on I. Then yi(t), ..., Yn(t) are linear independent in R” at some t € I 
if and only if yi(t), -.., yn(t) are linear independent in R” at every t € I. 


6.3. Application: Differential equations 257 


Thus either condition of the theorem is equivalent to the condition that 
{¥1, ---; Yn} is linearly independent on J. 


Theorem 6.22 The solution space S of a system y' = Ay on I, where A 
is ann Xx n matrix, is isomorphic to R". 


Proof: This is a direct consequence of the fundamental theorem of ordinary 
differential equations: the existence of a unique solution for each initial 
condition in R”. Indeed, to each solution y(t) of the system, the assignment 
of the initial condition y(0) = yo is an isomorphism between S and R”. 


Definition 6.7 A basis for S is called a fundamental set of solutions for 
the interval J. A linear combination of a fundamental set is called a general 
solution of the system. The solution determined by a given initial condition 
is called the particular solution. 


Thus a general solution is determined by a basis for S, and any partic- 
ular solution can be obtained from a general solution by determining the 
coefficients from the given initial condition. 


Example 6.15 For the system 


Oa el ale ee 


one can easily verify that the vector functions 


vo [283] -[8] o gT 


are solutions of the system. Their Wronskian is 


et 


W (t) = det | s ee | = 240, 


for all t. Thus {y1, yo} is a fundamental set and y(t) = c1yi(t) + c2y2 (t) 
is a general solution. If the initial condition is given as y(0) = (2,0), then 


” TEER 


one gets cy = c2 = 1, and the particular solution is y(t) = yı (t) + yo(t). 


258 Chapter 6. Eigenvectors and Eigenvalues 


Example 6.16 One of the fundamental problems of mathematical ecology 
is the predator-prey problem. Let a(t) and y(t) denote the populations at 
time ¢ of two species in a specified region, one of which x preys upon the 
other y. For example, x(t) and y(t) may be the number of sharks and small 
fishes, respectively, in a restricted region of the ocean. Without the fishes 
(preys) the population of the sharks (predators) will decrease, and without 
the sharks the population of the fishes will increase. A mathematical model 
showing their interactions and whether an ecological balance exists can be 
written as the following system of differential equations: 


i a(t) = ax(t)—b x(t)y(t), 
y(t) = —ey(t) +d a(t)y(t). 


In this equation, the coefficients a and c are the birth rate of x and the death 
rate of y, respectively. The nonlinear x(t)y(t) terms in the two equations 
mean the interaction of the two species, so the coefficients 6 and d are 
the measures of the effect of the interaction between them. A study of this 
general system of differential equations leads to very interesting development 
in the theory of dynamical systems and can be found in any book on ordinary 
differential equations. Here, we restrict our study to the case of x and y very 
small, ¿.e., near the origin in the plane. In this case, one can neglect the 
nonlinear terms in the equations, so the system is assumed to be given as: 


ea 0 lees | 

y'(t) 0 -c } | yt) | 

Thus the eigenvalues are A; = a, Ag = —c, and their eigenvectors e1, e2, 
respectively. Thus, its general solution is 


a(t dye" ee 0 d 2 
aeiee 


In the following, we consider a system of homogeneous linear differential 
equations y'(t) = Ay(t) on J with an initial condition yo = (d1, do, ..., dn). 
(1) Suppose first that A is a diagonal matrix D. Then 


yilt) Ay 0 yi(t) Aiyi(t) 


v(t) lui Lats 


consists of just n simple linear equations of the first order: 


yi(t) = Axys(t), i=1, 2, ..., n. 


6.3. Application: Differential equations 259 


Hence, its solution is easily found to be y(t) = dijet with y;(0) = di, for 


i= 1,...,n. Thus, in the matrix notation, the solution can be written as, 
yı (t) dye! 
y(t) = : = : = dee, +--+ deen 
Yn(t) dyer! 
ent 0 dı 
= = e'Pyo, 
0 ernt dn 


where et? is, by definition, 


This matrix is called an exponential matrix of tD. 


Remark: Actually, the above solution e” yọ is a general solution of the 


system. Indeed, one can easily see that, for the standard basis {e1,...,en} 
for R” as the initial conditions, their corresponding solutions y;(t) = eP” e; = 
eite;, i =1,..., n, form a fundamental set of solutions. 


(2) We next assume that a matrix A in the system y’(t) = Ay(t) is 


diagonalizable, that is, it has n linearly independent eigenvectors v1, ..., Vn 
belonging to the eigenvalues A1,..., An, respectively. Then, for the transition 
matrix Q = [v1 ++ Vn], we have 
| ài 0 
A=QDQ'=Q th Ql. 
0 Xn 


Thus the system becomes Q-'y’ = DQ7'y. If we take a change of variables 
with the new vector x = Q-'y (or y = Qx), then we obtain a new system 


ert 0 c 


x(t) = exo = h, : | = ePxo. 


260 Chapter 6. Eigenvectors and Eigenvalues 


Therefore, a general solution of the original system y’ = Ay is 


y(t) = Qx(t) = Qe'?xo 


[vi sade Vn] 


= aetv] +ee™?tva +- + Cne vp, 


where xo = x(0) = Q`tyo = (c1,---, Cn) 

Remark: Note that each vector function y;(t) = etv; is the solution of the 
system determined by the initial condition y;(0) = vj, for i = 1,...,n, which 
are linearly independent eigenvectors of A. Hence, they form a fundamental 
set of solutions. 


Example 6.17 Solve the system of linear differential equations 


y = 5y — 4y + 4y 
y = 12y, — lly + 12y3 
y = 4y — 4y + 5y, 


and also find its particular solution satisfying the initial conditions y1 (0) = 0, 
y2(0) = 3 and y3(0) = 2. 


Solution: In matrix form, the system may be written as y’ = Ay with 


| 5 —4 4 
A=|12 -11 12 
| 4 —4 5 


(1) The eigenvalues of A are 4; = Ag = 1, and A3 = —3, and their 
eigenvectors are vı = (1,1,0), v2 = (—1,0,1) and v3 = (1,3, 1), respectively, 
which are clearly linearly independent (see Problem 6.10). 


1 -1 1 1 0 0 
(2) For Q = [vı v2 v3} =| 1 0 3 |,Q'1AQ=D=}0 1 0 
0 11 0 0 -3 
(3) By Theorem 6.27, we get 
ie eater ie le 
Yə = ač) 1 | +be 0 | +e] 3 
| Y3 | 0 | 1 1 


= aevi + bev + ce *v3. 


6.4. Exponential matrices 261 


(4) From the initial condition yo = [0 3 2], we obtain x9 = Q~!yo = 
[011]? i.e., a = 0, b = c = 1. Thus the particular solution is 


yı —1 1 -e +e 7% 
yo | =e! 0| +e] 3 |= be! 
Y3 1 1 ef +e 3t 


(3) For a general matrix A, a general solution of a system y’ = Ay of 
linear differential equations will be discussed in the next section. 


6.4 Exponential matrices 


Now, for a general square matrix A, it turns out that the solution of the 
system y’ = Ay with y(0) = yo is of the form 


y= et^yo, 


provided that e*4 is made meaningful. In fact, motivated by the Maclaurin 
series of the exponential function e”, the exponential matrix e^ of a matrix 
A is defined as the limit of an infinite series of matrices of the following 
form: 

For an n x n square matrix A, the series 


AF A A Ak 
aloe ay 


is defined to be the limit of the sequence of the partial sums: 


m Ak A2 A™ 


provided the limit exists. 


Theorem 6.23 For any square matrix A, the sequence {[Sm]ij : Mm > O} is 
convergent to a number Lij, for all 1 <1, j <n. 


Proof: Since A has only n? entries, there is a number M such that |a;;| < M 
for all (i,j)-entries a;; of A. Then one can easily show that |[A*];;| < 
nt! MF for all k > 0 and 1 <i, j <n, and 


262 Chapter 6. Ejigenvectors and Eigenvalues 


Thus [Sm]ij, as a monotonically increasing sequence, is bounded above, and 
so converges to a limit number Lij. 


Definition 6.8 The limit matrix, whose entries are Lij = limm—oo[Smlij: 
is denoted by 
F A AB Ak 


which is called the exponential matrix of A. Thus, [e^]ij = Lij. 


Practically, the computation of e^ involves the computation of the pow- 
ers A* for all k > 1, which is not easy in general. Nevertheless, Theorem 6.23 
shows that the limit e4 exists for any square matrix A. 


Ài 0 AK 0 
Example 6.18 For D = me Bee 2a for any 
0 An 0 AK 
integer k. Thus, the exponential matrix e? is 
1 1 
e? = I+D+4D’ +D’ +... 
2! 3! 
CO 
At 
a 9 
ae k! eòl 0 
oo À 
AE 0 enn 
0 Le 


which coincides with the previous definition. 


If A is diagonalizable and A = QDQ~! for some invertible matrix Q, 
then by Theorem 6.15, 


(QDQ™)? _, (QDQ"*)? 


eA = eP = DONO ea ea 
D? D? 7 
= QU Pa Pa ee 
= r 


whose computation is easy. 


6.4. Exponential matrices 263 


3 


Example 6.19 Let A= 5 | | Then its eigenvalues are 1 and 2 with 


eigenvectors uy = | 1 | and u2 = | | respectively. Thus A = QDQ™! 
1 
1 


1 0 


: —1 
with D = | 02 | and Q = | 1 . Therefore, 


e ee 


Example 6.20 Let A = | 


1 
G TA A RE 
T 10) ,f1 1) ,1/1 143 eee a e!l x 
~ |O 1 0 3 2) 10° 3? Oe èl’ 


It is a good exercise to calculate the missing entry * directly. 


The following theorem shows some basic properties of the exponential 
matrices, whose proofs are easy, and are left for exercises. 


Theorem 6.24 (1) e^t? = ete? provided that AB = BA. 


A 


(2) In particular, the matrix eĉ is never singular for any square matrix 


A, and (ef)! = e4. 
(3) e2 '4@ = Q-1e4Q for any invertible matriz Q. 
(4) F ^M, Ao, ..., An are the eigenvalues of a matrix A with their asso- 


ciated eigenvectors V1, V2, ..., Vn, then e™?’s are the eigenvalues of 


e^ with the same associated eigenvectors v;’s fori = 1, 2, ..., n. 
Moreover, det e4 = e*1.-- eò = etA) #0 for any square matriz A. 
Problem 6.16 Prove Theorem 6.24. 


Problem 6.17 Prove that if A is skew-symmetric, then e^ is orthogonal. 


In general, the computation of e^ is not easy at all if A is not diagonal- 
izable. However, for a triangular matrix of order 2, one can do something. 


264 Chapter 6. Eigenvectors and Eigenvalues 


Example 6.21 For A = | : 
plication of Theorem 6.24 (1). We first write it as A = 21 + N, where 


N= | i | Since (21)N = N(2I), by Theorem 6.24 (1) 


f one can compute eĉ as a simple ap- 


From the direct computation of the series expansion, we get e” = eI. 
Moreover, since N* = 0 for k > 2, e = I+N4 Ney. HN = k i |. 
Thus, 


Aseel ele 


2 3 0 
Problem 6.18 Compute e^ for A = | 0 2 3 | ; 
0 0 2 


One of the most prominent applications of exponential matrices is to 
the theory of linear differential equations. In fact, we show that a general 
solution of y’(t) = Ay(t) is of the form y(t) = e’4yp. 


Lemma 6.25 For any t E€ R and any square matrix A, the exponential 

matrix 
t? t 

tA SI +tA+ AHAH 


e 


d 
is a differentiable function of t, and r = At^, 


Proof: By absolute convergence of the series expansion of e’4 one can use 
term by term differentiation, i.e., 
d A d Pog t 


DA Sepa 
de HE! Jl 


The following theorem is a direct consequence of Lemma 6.25. 


6.4. Exponential matrices 265 


Theorem 6.26 For any n x n matriz A, 
y(t) = eyo 


is the unique solution of the linear differential equation y' = Ay with initial 
condition y(0) = yo. 


In particular, if A is diagonalizable by a transition matrix Q = [v1 <+- Vn] 
consisting of n eigenvectors of A belonging to the eigenvalues A1,..., An, then 
a general solution of a system y’ = Ay is obtained as 


=i 
y(t) =e yo = eP yo = Qer Q yo 
ert 0 c 
= [v Vn] 
0 ernt Gy 


= aetv] +ee™tva +--+ + Cne vp, 
where (c1, ---, Cn) = Q ‘yo is determined from an initial condition yo. 


Corollary 6.27 Ifannxn matriz A = QDQ`! with Q = [vı +++ Vn], then 
a general solution of the system of linear differential equations y'(t) = Ay (t) 
is of the form 


y(t) = QP Q lyo = a e™tvi +ee™tvo +--+ ene vp. 


If an initial condition yo = (dı,...,dn) is given, the particular solution is 
determined by the constants (c1,...,¢n) =Q-'yo.- 


Note that {y;(t) = etv; : i =1,...,n}, determined by the eigenvectors 
yi (0) = v; of A, is a fundamental set of solutions. Consequently, when A is 
diagonalizable, a general solution of y'(t) = Ay(t) may be obtained directly 
from a basis of eigenvectors and eigenvalues of A without looking for an 
individual fundamental set. 


Example 6.22 [Revisit Example 6.15] For A = | ; x | solve y'(t) = 
Ay(t) with initial condition y1 (0) = 1, y2(0) = 0. 


Solution: (1) The eigenvalues of A are \; = 1 and Ay = —1 with associated 
eigenvectors vı = [1 1]/ and vo = [1 — 1)”, respectively. 


266 Chapter 6. Eigenvectors and Eigenvalues 


: 1 1 2 1 0 
(2) By setting Q = [vi va] =| | pa aee,, Sa 
(3) A general solution is 
if E 
yO Se p eR yor Se" 0 yo 


where be | = Q7! | at |: Since dı = 1 and də = 0, cy = & = $ and 
C2 dy 


2 
wple]: 


as we expected from Example 6.15. 


1 = 
Problem 6.19 Solve the system { n =n +t p 

y = 4y — w. 

yu = 4 + y 
Problem 6.20 Solve the system {4 ys = —2yı + y2 

ys = —2y1 + 4s; 


and find the particular solution of the system satisfying the initial conditions 
yi (0) = —1, y2(0) = 1, y3(0) = 0. 


If A is not diagonalizable, then it is not easy to compute e' directly. 
However, one can still reduce A to a simpler form called the Jordan canonical 
form, which will be introduced in Chapter 8, and with which the compu- 
tation of e’4 is made relatively easy. The matrix in the following example 
is not diagonalizable, but the computation of e'4 is possible and will be 
treated in general in Chapter 8. 


Example 6.23 Solve the system y’ = Ay of linear differential equations 
with initial condition y(0) = yo, where 


6.4. Exponential matrices 267 


Solution: First note that A has an eigenvalue A of multiplicity 2 and is not 
diagonalizable. One can rewrite A as 


Aà 0 0 1 
an [2 O]o[9 an 


Then, by the same argument as in Example 6.21, 


1 
ot — et(AI+N) — pdt gtN — oà | ; i | 


Therefore, the solution is 


bert 


(a + bt)e™ | =o [2] eee] 8]. 


In terms of components, yı = (a + bt)e*”, yo = be. 


Example 6.24 Find the general solution of the system y’ = Ay, where 
a —b 
= | ae | l 

Solution: Note that the eigenvalues of A are a + ib, which are not real 
unless b = 0, in which case the matrix is already in diagonal form. If b Æ 0, 
then they are complex eigenvalues. This case will be discussed in Chapter 8. 
However, one can compute e' directly without using diagonalization. We 
first write A as 


a —b 1 0 0 —1 
aek =a EGE | sar tos 


then clearly IJ = JI, so et^ = eM! +I = ettebt . Since 


ie ee ral 0 > |== ral er 


0 -1 —1 0 0 1 
one can deduce J* = J**4 for all k = 1,2,--- , and, moreover, 
btJ (bt)? J?  (bt)J?  (bt)tJt 
btJ 
D 3! TE 
(bt)? (bt)4 ae (bt)? (bt)? 
> con i (08) + Bt 
7 bt)? — (bt)5 bt)? (bt)4 
on- 288, &) | (oF | 
3! 5! 2! 4! 


268 Chapter 6. Eigenvectors and Eigenvalues 
cos bt —sinbt 
sin bt cos bt 


for any constant b and t. Thus, the general solution of y’ = Ay is 


tA, atbtJ cat | COSOE- —sinbt C1 
ee ese eee bee cos bt co | 


In terms of components, 


yı = e™(c cos bt — cg sin bt) 
yo = e™%(c sin bt + co cos dt). 


Problem 6.21 Solve the system y’ = Ay with initial condition y(0) = yo by 
computing e4y9 for 


Remark: Consider the n-th order homogeneous linear differential equation 


d”y qty d'-*y 
2 dtn-2 


+++ +any =0, 


where a; are constants, for a function y(t) on an interval J = (a,b). The 
fundamental theorem of differential equations says that this kind of equation 
also has a unique solution y(t) on I foe a given initial condition: that is, 
for a point tọ in J, and arbitrary constants cg, ..., Cn—1, there is a unique 
solution y = y(t) of the equation such that y(to) = co, y'(to) = a, ---, 
yH (to) = cn—1. This differential equation may be converted into a system 
of linear differential equation as follows: set 


YU = y, 
— l — dyı 
Y2 dt 5 
gt ee 
3 y dt , 
= dYn—1 
Yn = (aml) ae 
then clearly 
dyn dy 
— = DW = al Yn — 42Yn-1 T `t: — An-1Y2 — anY1- 


dt dt” 


6.5. Diagonalization of linear transformations 269 


In matrix notation, this can be written as 


I —01 —AQ *'* ~An—-{ an 
up 1 0 - 0 0 se 
y(t) as : = 0 1 32:2 0 0 i = Ay(t). 
Y2 : : : : Y2 
; : : : : 
Y1 0o 0o 1 0 Yı 


This is just a system of linear differential equation with the companion 
matrix A, which is exactly the same form as a linear difference equation 
treated in Section 6.3. Therefore, the general solution of the original differ- 
ential equation is y(t) = yı (t) in the solution of y'(t) = Ay (t). When A has 
distinct eigenvalues, it is of the form: 


y(t) =de + «+» +dne*’, 


for t € I. In Chapter 8, we will discuss the case of eigenvalues with multi- 
plicity. 


6.5 Diagonalization of linear transformations 


Recall that two matrices are similar if and only if they can be the matrix 
representations of the same linear transformation, and similar matrices have 
the same eigenvalues. In this section, we aim to find a basis œ so that 
the matrix representation of a linear transformation with respect to a is a 
diagonal matrix. First, we start with the eigenvalues and the eigenvectors 
of a linear transformation. 


Definition 6.9 Let V be an n-dimensional vector space, and let T : V > V 
be a linear transformation on V. Then the eigenvalues and eigenvectors 
of T can be defined by the same equation, Tx = Ax, with a nonzero vector 
xEV. 


Practically, the eigenvalues of T are computed as those of the matrix 
representation [T]a of T with respect to a basis a for V. In fact, this is well 
defined, since [T]a is similar to [T]g for any other basis 6 for V and their 
eigenvalues are the same by Theorem 6.7. 

The eigenvectors of T can also be found from the eigenvectors of its 
matrix representation. Let a = {v1, V2, ..-, Vn} be a basis for V. Then 
the natural isomorphism ® : V — R” identifies the associated matrix A = 


270 Chapter 6. Eigenvectors and Eigenvalues 


[T]a : R” — R” with the linear transformation T : V — V via the following 
commutative diagram. 


Let À be an eigenvalue of A (also of T). Then, x = (x1, £2, ..., Zn) € 
R” is an eigenvector of the matrix A belonging to A (Ax = Ax) if and only 
if @-'(x) = v = 1V1 + 2V2 +++: + 2nVn E V is an eigenvector of T 
(T(v) = Av), because the commutativity of the diagram shows 


[T(v)le = [Tle[Vla = Ax = Ax = [àv]a. 


Therefore, if x1, X2, ..., Xz are linearly independent eigenvectors of A = 
[T]a, then ®~1(x,), ®-! (xg), ..., ®-1(x,) are linearly independent eigen- 
vectors of T. Hence, the linear transformation T has a diagonal matrix 
representation if and only if it has n linear independent eigenvectors, by 
Theorem 6.1. 

The following example illustrates how to find a diagonal matrix repre- 
sentation of a linear transformation on a vector space. 


Example 6.25 Let T : P)(R) — P2(R) be the linear transformation defined 
by 
(Tf)(2) = f(a) + z f'(£) + f(z). 


Find a basis for P>(R) with respect to which the matrix of T is diagonal. 


Solution: To find the eigenvalues and the eigenvectors of T, take a basis 
for the vector space P2(R), say a = {1, xz, x}. Then the matrix of T with 


respect to & is 
| 1 1 0 | 
ae ee -22 


[o 0 3 | 


which is upper triangular. Hence, the eigenvalues of T are A; = 1, A2 = 2 
and à = 3. By a simple computation, one can verify that the vectors 
xı = (1, 0, 0), x2 = (1, 1, 0), and x3 = (1, 2, 1) are eigenvectors of [T], in 
R? belonging to eigenvalues ài, A2, A3, respectively. Their corresponding 


6.5. | Diagonalization of linear transformations 271 


eigenvectors of T in P2(R) are fi(z) = 1, falz) = 1 +z, fs(z) = 14+ 
2g + x”, respectively. Since the eigenvalues à, Àz, Àg are all distinct, 
the eigenvectors {X1, x2, x3} of |T]a are linearly independent and so are 
B={hi, f2, f3} im Po(R). Thus, each f; is a basis for the eigenspace E();) 
of T belonging to A; for i = 1, 2, 3, and the transition matrix is 


| 1 1 1 
Q = [Id] = [x1 x2 x3] = [[fila [fala [fala] = | 0 1 2 
lo 01 


Hence, by changing the basis œ to 8, the matrix representation of T is a 
diagonal matrix: 


ie 0 0 
[Tle = uiid =Q*T),Q= | 0 2 0) =D. 
[o 0 3 


Note that, if T = A is an n x n square matrix written in column vectors, 
A = [c1 -++ Cn], then the linear transformation A: R” — R” is given by 
A(ei)=c;, i= 1, ..., n, so that A itself is just the matrix representation 
with respect to the standard basis a = {e1, ..., e,} for R”, say A = 
[A]a. Now if there is a basis 8 = {x1, ..., Xn} of n linearly independent 
eigenvectors of A, then the natural isomorphism ® : R” — R” defined by 
®(x;) = e; is simply a change of basis by the transition matrix Q = [Id] 
and the matrix representation of A with respect to 8 is a diagonal matrix: 


D = [Alp = Q™"[A]laQ = QAQ. 


Problem 6.22 Let T : P2(IR) > P2(R) be the linear transformation defined by 
T(f(x)) = f(x)+af'(x). Find all the eigenvalues of T and find a basis a for P(R) 
so that [7], is a diagonal matrix. 


Problem 6.23 Let M2x2(R) be the vector space of all real 2 x 2 matrices and let 
T be the linear transformation on M2, 2(IR) defined by 


T E J Ea a+b+c 


c d| | b+e+d a+c+d |’ 


Find the eigenvalues and basis for each of the eigenspaces of T, and diagonalize T. 


Problem 6.24 Let T be the linear transformation on R? defined by 
T(z, y, 2) = (4z +z, 27+ 3y + 2z, z + 4z). 
Find all the eigenvalues and their eigenvectors of T and diagonalize T. 


272 Chapter 6. Eigenvectors and Eigenvalues 


6.6 Exercises 


6.1. Find the eigenvalues and eigenvectors for the given matrix, if they exist. 


1 —4 -1 
Gr ae @|3s 2 sf, 
1 1 3 
0 1 0 1 1 1 1 1 
1 0 1 0 1 1 1 1 
(3) 0 1 0 1? (4) 111414)’ 
1 0 1 0 eds sh i 
2 -l1 0 =i 1 —5 0 0 
—1 2 -1 0 5 1 0 0 
(5) 0 -l 2 —-1 |’ (6) 0 0 1 -2 
ak SOs a [o 02 1 


6.2. Find the characteristic polynomial, eigenvalues and eigenvectors of the matrix 


—2 0 0 ] 
A= 3 2 3 
4 -1 6 | 
: a b 
6.3. Show that a 2 x 2 matrix A= | a | has 


(1) two distinct real eigenvalues if (a — d)? + 4bc > 0, 
(2) one eigenvalue if (a — d)? + 4bc = 0, 

(3) no real eigenvalues if (a — d)? + 4be < 0, 
(4) 


only real eigenvalues if it is symmetric (i.e., b = c). 
6.4. Suppose that a 3 x 3 matrix A has eigenvalues —1, 0, 1 with eigenvectors 


u, v, w, respectively. Describe the null space (A), and the column space 
C(A). 


6.5. If a 3 x 3 matrix A has eigenvalues 1, 2, 3, what are the eigenvectors of 
B=(A-I)(A-21(A-31)? 


6.6. Show that any 2 x 2 skew-symmetric nonzero matrix has no real eigenvalue. 


6.7. Find a3 x3 matrix that has the eigenvalues Ay = 1, Az = 2, Ag = 3 with the 
associated eigenvectors x; = (2, —1, 0), x2 = (—1, 2, —1), x3 = (0, —1, 2). 


6.8. Let P be a projection matrix that projects R” onto a subspace W. Find the 
eigenvalues and the eigenspaces of P. 


6.9. Let u, v be n x 1 column vectors, and let A = uv’. Show that u is an 
eigenvector of A, and find the eigenvalues and the eigenvectors of A. 


6.10. Show that if \ is an eigenvalue of an idempotent nxn matrix A (i.e., A? = A), 
then À must be either 0 or 1. 


6.20. 


Exercises 273 


. Prove that if A is an idempotent matrix, then tr(A) = rank A. 


. Let A = [aij] be an n x n matrix with eigenvalues \1, ..., An. Show that 
dj = ajj + D> (ai — Aa) forj=1,..., n. 
tFj 


. Let A be a diagonalizable n x n matrix, and let U be a subspace of R” 


invariant under A (i.e., AU C U). 


(1) Suppose that x1,...,x, are eigenvectors of A belonging to distinct 
eigenvalues. Prove that if xj +---+ x, is in U, then x; € U for 
alli = 1,...,k. (Hint: Use induction on k.) 


(2) Prove that the restriction of A to any nontrivial invariant subspace is 
also diagonalizable. 


. Let A and B be two diagonalizable matrices. Prove that that they are si- 


multaneously diagonalizable (i.e., there exists an invertible matrix Q 
such that both Q-!AQ and Q7!BQ are diagonal) if and only if AB = BA. 
(See Exercise 7.18.) 


. Let D : P3(IR) > P3(R) be the differentiation defined by Df(x) = f'(x) for 


f € P3(R). Find all eigenvalues and eigenvectors of D and of D?. 


. Let T : P (R) > P2(R) be the linear transformation defined by 


T (azz? + aiz + ao) = (ao + a1) 2? + (ay + a2)x + (ao + a2). 


Find a basis for P,(IR) with respect to which the matrix representation for T 
is diagonal. 


. Determine whether or not each of the following matrices is diagonalizable. 


2 1 -1 2 0 0 3 0 2 
(1) 1 0 2 (2) 1 2 0 (3) 0 2 0 
-1 2 3 0 1 2 —2 0 -I1 
. Find an orthogonal matrix Q and a diagonal matrix D such that QT AQ = D 
for 
-3 2 4 1 12 0 | 1 0 0 ] 
(1) A= 2 —6 2 |, (2)A=]2 2 2), 8G)A=] O01 1]. 
4 2 3 | 023] EN 
| 1 2 —1 | 2 
. Calculate A!?x for A= | 0 5 -2 x=! 4 


[o 6 -2 |7 


For n > 1, let an denote the number of subsets of {1, 2, ..., n} that contain 
no consecutive integers. Find the number a, for all n > 1. 


274 


6.21. 


6.22. 


6.23. 


6.24. 


6.25. 


6.26. 


6.27. 


Chapter 6. Eigenvectors and Eigenvalues 


Evaluate det An, where 


1 1 0 0 0 0 0 0 
1 1 1 0 0000 
0 1 1 1 0000 
Ans ji o] O dae 
0000+- 1110 
0000 >- 0 1 1 1 
OO) u00 s QO T. ol 


is the n x n {0, 1}-matrix with 1’s on the main diagonal and its two parallel 
side diagonals. 


0 
0.6 
For xo = (1, 1), calculate lim xz, where x, = Axp_1, k = 1,2,---. 

k— oo 


Let A = | = | . Find a value x so that A has an eigenvalue A = 1. 


Compute e^ for 

1 1 3 1 
dasla | aea 
In 1985, the initial status of the car owners in a city was reported as follows: 
40% of the car owners drove large cars, 20% drove medium-sized cars, and 
40% drove small cars. In 1995, 70% of the large-car owners in 1985 still owned 
large cars, but 30% had changed to a medium-sized car. Of those who owned 
medium-sized cars in 1985, 10% had changed to large cars, 70% continued 
to drive medium-sized cars, and 20% had changed to small cars. Finally, of 
those who owned the small cars in 1985, 10% had changed to medium-sized 
cars and 90% still owned small cars in 1995. Assuming that these trends 
continue, and that no car owners are born, die or otherwise add realism to 
the problem, determine the percentage of car owners who will own cars of 
each size in 2025. 


1 1 
Let A=| 6 A 


(1) Compute e^ directly from the expansion. 
(2) Compute e^ by diagonalizing A. 


Let A(t) be a matrix whose entries are all differentiable functions in t and 
invertible for all t. Compute the following: 


d 3 d 2i 
D EAW), 2) EAO). 
Solve y’ = Ay, where 
6 2 8 ] 2 1 
(1) A=] -1 8 4 and y(l)=] 1}. 
2 -12 —6 | o | 


6.6. 


6.28. 


6.29. 


6.30. 


Exercises 275 
y = n — yo + W 
Solve the system 4 yf = 3yı + 4ys3 
3 = WM + y 


Y3 
with initial conditions yı (0) = 0, y2(0) = 2, y3(0) = 1. 


Let f(A) = det (AI — A) be the characteristic polynomial of A. Evaluate f(A) 
for 


3 1 1 1 2 2 
(AS | 242, @®BA=| 12 
1 1 8 -1 1 4 
In fact, f(A) = 0 for any square matrix A and its characteristic polynomial 


f(A) (this is the Cayley-Hamilton theorem). 


Determine whether the following statements are true or false, in general, and 
justify your answers. 


(1) If B is obtained from A by interchanging two rows, then B is similar 
to A. 

(2) If A and B are diagonalizable, so is AB. 

(3) Every invertible matrix is diagonalizable. 

(4) Every diagonalizable matrix is invertible. 

(5) Interchanging the rows of a 2 x 2 matrix reverses the signs of its eigen- 

values. 

(6) A matrix A cannot be similar to A+ I. 

(7) The sum of the eigenvalues of A+B equals the sum of all the individual 
eigenvalues of A and B. 


Chapter 7 


Complex Vector Spaces 


7.1 The complex n-space C” 


So far, We have been dealing with real vector spaces and real matrices only. 
However, as we have seen in Chapter 6, real scalars are not enough to solve 
many important problems in linear algebra. For example, some character- 
istic polynomials even with real coefficients may have complex roots which 


1 —1 
are not in the set of real numbers. For instance, the matrix A = 1 1 | 


has two non-real complex eigenvalues A = 1 +å. 

Thus, it is indispensable to work with complex numbers, and to extend 
the concept of real vector spaces to that of complex vector spaces. Note 
that, in the field of complex numbers, any polynomial of degree n (even 
with complex coefficients) has n complex roots counting multiplicities (This 
is well-known as the fundamental theorem of algebra). Hence, any square 
matrix of order n now has precisely n eigenvalues, and so one can continue 
to diagonalize matrices. 

A complex vector space is a vector space whose scalars are the complex 
numbers. The most standard complex vector space is the complex n-space 
C”, which is the set of all ordered n-tuples (21, 22,.-.,2n) of complex num- 
a C" = {(21, 20)..-,2n) a EC, 1=1,2,...,n}. 


In C”, the addition and the scalar multiplication are defined as follows: 


(21, Z230., Zn) + (Z1, Z3) = (z1 +24, Z2 Sw Zn + za) 
k(z1,22;,...;,Zn) = (kzi, kz2,..., kzn), for ee C. 
The standard basis for the space C” is again {e1, €2, ..., En} as the real 


case, but the scalars are now complex numbers so that any vector z in C” 


277 


278 Chapter 7. Complex Vector Spaces 


is of the form z = X p; zpex With zk = £k + typ € C, i.e., z = x + iy with 
x, y ER’. 

In a complex vector space, by allowing the scalars to be complex num- 
bers, all the basic concepts and theorems previously defined and proven for 
the real case, except for Chapter 5, are all remain true. 

However, concerning the inner product in a complex vector space, the 
definition needs to be slightly modified from the real case. Note that, by 
identifying a complex number z = x+iy with a point (a, y) in the plane R’, 
the modulus (or magnitude) of z is given as the nonnegative real number |z| = 
Vx? + y2, which is Zz where Z is the complex conjugate of z. Therefore, 


the length of a vector z = (21, ..., Zn) € C” has to be defined as ||z|| = 
Vial? +++: + |zn|? where |zk|? = Zkzk, but not as ||z||? = 2? +- + 22. 
Definition 7.1 For two vectors u = [u1 ug --- Un] and v = [v1 v2 --: Dal 


Uk, Uk E€ C, in C”, the dot (or Euclidean inner) product u -v of u and 
v is defined by 
u: v = Tv + v2 +--+ + Unn = Tv, 


where U = [W W2 -+ Tn], the conjugate of u. 
The Euclidean length (or magnitude) of a vector u € C” is defined 


by 
lull = (au)? = vja P + fu? +--+ Tun, 
where |uz|? = U,uz, and the distance between two vectors u and v in C” 
is defined by 
d(u,v) = ||u — v|]. 


In an (abstract) complex vector space, one can also define an inner prod- 
uct by adopting the basic properties of the Euclidean inner product on C” 
as axioms. 


Definition 7.2 A (complex) inner product (or Hermitian inner prod- 
uct) on a complex vector space V is a function that associates a complex 
number (u, v) with each pair of vectors u and v in V such that the following 
rules are satisfied: For all vectors u, v and w in V and all scalars k in C, 


(1) (u, v) = (v, u), 

(2) ( ) 

(3) (ku,v) = klu, v): Sesquilinear in the first variable, 
( 


(4) 


u+v,w) = (u,w) + (v, w), 


v,v) > 0, and (v, v) = 0 if and only if v = 0: Positive definiteness. 


7.1. The n-space C” and complex vector spaces 279 


A finite dimensional complex vector space together with a Hermitian 
inner product is called a unitary space, or a Hilbert space. In particular, 
the n-space C” with the dot product is called the (complex) Euclidean n- 
space. 


Problem 7.1 Show that the Euclidean inner product on C” satisfies all the inner 
product axioms. 


The followings are immediate from the definition of an inner product: 
(5) (0,v) = (v0) =0. 
(6) (u,v + w) = (u,v) + (u, w), 
(7) (u, kv) = k(u,v). 


Remark: There is another way to define an inner product on a complex 

vector space. If we redefine the dot product u-v on the n-space C” by 
U: V = Uy + U22 +++ + UnUn, 

then the third rule in Definition 7.2 should be modified to be 

(3') (u, kv) = k(u,v), so that (ku, v) = k(u,v). 


But these two different definitions do not induce any essential difference in 
a complex vector space. 


In a unitary space, as the real case, the length (or magnitude) of a 
vector u and the distance between two vectors u and v are defined by 


1 
lul] = (u, u)?, d(u, v) = |lu — v|], 
respectively. 


Example 7.1 Let Cca, b] denote the set of all complex-valued continuous 
functions defined on |a, b]. Thus an element in Cc[a, 6] is of the form 
f(x) = fi(x) + tfo(x), where fi(x) and f2(z) are real-valued continuous on 
[a,b]. Note that f is continuous if and only if each component function fi 
is continuous. Clearly, the set Cc|a, b] is a complex vector space under the 
sum and scalar multiplication of functions. For a vector f(a) = f1(x%)+ifo(ax) 
in Cola, b], its integration is defined as follows: 


f tae = 1 [fi(z) +if2(z)] da = i fi(x)da rif fo(a)da. 


280 Chapter 7. Complex Vector Spaces 


It is an elementary exercise to show that, for vectors f(x) = f(x) + ifls) 
and g(x) = gı (x)+ig2(x) in the complex vector space Cc|a, b], the following 
formula defines an inner product on Cc[a, b]: 


b 
(fg) = / Fa)g(«)de 


b 
f E) -ihl one) + gale) ae 


b 
[ ea) + falogate)] ae 


b 
+i | hool) - fho oar 


In a unitary space, the other basic concepts, such as orthogonality, or- 
thonormal basis and the Gram-Schmidt orthogonalization, discussed up to 
Chapter 6, remain the same. Moreover, if V is an n-dimensional complex 
vector space, then by taking an orthonormal basis for V, there is a natu- 
ral isometry from V to C” that preserves the inner product as in the real 
case. Hence, without loss of generality, one may only work on C” with the 
Euclidean inner product, and we use - and ( , ) interchangeably. 


Example 7.2 Consider the complex Euclidean space C?. Apply the Gram- 
Schmidt orthogonalization to convert the basis x; = (i, i, i), x2 = (0, i, 4), 
x3 = (0, 0, 2) into an orthonormal basis. 


Solution: Step 1: Set 


X] (i, i, i) ( i i i ) 
uj = = = = = 5 
[1 | v3 V3 V3) V3 
Step 2: Let W, denote the subspace spanned by u1. Then 


x2 — Projy,xX2 = X2— (u1, X2)u1 
2 


II 

| 
w| ty 
ears 
w] s. 
SV 


Therefore, 


x2 — Projw, x2 3 ( 2i i :) ( 2i i 4 ) 
De ee a ope e ee Se hs > pet pe ee 
? Iba Projw, xel = V6 \ 37 3 38 vo V6’ V6 


7.1. The n-space C” and complex vector spaces 281 


Step 3: Let W2 denote the subspace spanned by {u;, ug}. Then 


x3 — Projy,x3 


= X3— (u1, X3) U1 = (u2, X3)U2 


= (0, 0, i) A FF FR =) 7 7 (-> Fe z) 


II 
TN 
© 
| 
N| >. 
N| = 
wW 


Therefore, 


x3 — Projw, X3 ( i -) ( i 5) 
ig ee 0 0, ae Oe ee SS) 
°* Tiks — Projy, xsl ae Ja 2 


Thus, 


n= Gee aa) Ce ae a) Oe a) 


form an orthonormal basis for C?. 


Example 7.3 Let Cc[0, 27] be the complex vector space with the inner 
product given in Example 7.1, and let W be the set of vectors in Cc[0, 27] 
of the form 


et — cos kg + isin kz, 


where k is an integer. The set W is orthogonal. In fact, if 
tha tla 


g(r) =e" and ge(x) =e 


are vectors in W, then 


2m N on ‘ , on : 
(Sk, 8e) = f an) ENES] eh dap 
0 0 0 


20 20 
f cos(l — k)zdz + if sin(¢ — k)adax 
0 


E rz sin(é — kelo +4 [E cost- k)x].” =o, if k # £, 


JE de = 2r, if k=. 


282 Chapter 7. Complex Vector Spaces 


Thus, the vectors in W are orthogonal and each vector has length v2r. By 
normalizing each vector in the orthogonal set W, one can get an orthonormal 
set. Therefore, the vectors 


f(x) = et k=0, +1, +2, ..., 


form an orthonormal set in the complex vector space Cc|0, 27]. 


Problem 7.2 Prove that in a complex inner product space V, 

(1) Kx y)? < (x, x)y,y) (Cauchy-Schwarz inequality). 
2) Ilx + y|] < ||xl| + lly] (Triangular inequality). 
3) Ix +y]? = |x|? + lly||? if x and y are orthogonal (Pythagorean theorem). 


eS 


The definitions of eigenvalues and eigenvectors in a complex vector space 
are the same as the real case, but the eigenvalues can now be complex num- 
bers. Hence, for any n x n (real or complex) matrix A, the characteristic 
polynomial det (AJ — A) has always n complex roots (i.e., eigenvalues) count- 
ing multiplicities. For example, consider a rotation matrix 


_ | cos@ —sin#@ 

z | sind  cos0 | 
with real entries. This matrix has two complex eigenvalues for any 6 € R, 
but no real eigenvalues unless 6 = kr, for an integer k. 

Therefore, all theorems and corollaries in Chapter 6 regarding eigen- 
values and eigenvectors remain true without requiring the existence of n 
eigenvalues explicitly, and exactly the same proofs as the real case are valid 
since the arguments in the proofs are not concerned with what the scalars 
are. For example, one can have a theorem like ‘for an n x n matriz A, the 
eigenvectors belonging to distinct eigenvalues are linearly independent’, and 
“f the n eigenvalues of A are distinct, then the eigenvectors belonging to 
them form a basis for C” so that A is diagonalizable’. 


An n x n real matrix A can be considered as a linear transformation on 


both R” and C”: 


T:R’—>R" defined by T(x) = Ax, 
S:C*—+C* defined by S(x) = Ax. 


Since the entries are all real, the coefficients of the characteristic polynomial 
of A, p(A) = det (AZ — A), are all real. Thus, if À is a root of p(A) = 0, then 


7.1. The n-space C” and complex vector spaces 283 


its conjugate is also a root because p(A) = PA) = 0. In particular, any 
n xn real matrix A has at least one real eigenvalue if n is odd. 

Moreover, if x is an eigenvector belonging to a complex eigenvalue A, 
then the complex conjugate X is an eigenvector belonging to À. In fact, if 
Ax = Ax with x Æ 0, then 


Ax = Ax = Ax = Xx, 


where x denotes the vector whose entries are the complex conjugates of the 
corresponding entries of x. 

Using this fact, the following example shows that any 2 x 2 matrix with 
no real eigenvalues can be written as a scalar multiple of a rotation. 


Example 7.4 Show that if A is a 2 x 2 real matrix having no real eigen- 
values, then A is similar to a matrix of the form 


rcos rsin 
—rsin@ rcos0 |` 


Solution: Let A be a 2 x 2 real matrix having no real eigenvalues, and let 
à = a+ ib and À = a—ib witha, b € R and b ¥ 0 be two complex eigenvalues 
of A with associated eigenvectors x = u +iv and x = u—iv with u, v € R’, 
respectively. It follows immediately 


u = 3(x +X), NS —4(x—x), 
1 Á 
= (A +A); b = —=(A—d). 
Since À Æ À, the eigenvectors x and x are linearly independent in the com- 
plex vector space C’, so are they when CÊ is considered as a real vector 
space. It implies that the vectors x + x and x — x are linearly independent 
in the real vector space C?, (see Problem 7.3 below), so that the real vectors 


u and v are linearly independent in the subspace R? of the real vector space 
C?. Thus a = {u, v} is a basis for the real vector space R?, and 


1 1 = 
Au = 3 (Ax + Ax) = 3 (Ax + Ax) 


u+iv ~ fu-—iv 
= a( 5 ) +a/ 5 ) = au ow. 


Similarly, one can get Av = bu+av. It implies that the matrix representation 
of the linear transformation A : R —> R with respect to the basis a is 


aol 


284 Chapter 7. Complex Vector Spaces 


That is, any 2 x 2 matrix which has no real eigenvalues is similar to a matrix 
of such form. Now, by setting r = Va? + b? > 0, one can get a = r cos 0 and 
b = r sin 0 for some 0 € R, so 


—rsin@ rcosé 


Ala = | 


rcos@ rsing | 


Problem 7.3 Let x and y be two vectors in a vector space V. Show that x and 
y are linearly independent if and only if x + y and x — y are linearly independent. 


Problem 7.4 Find the eigenvalues and the eigenvectors of 


pio a eS 
() j0 2 0j, (2) i—i 2 0 
SE iero i 


Problem 7.5 Prove that an n x n complex matrix A is diagonalizable if and only 
if A has n linearly independent eigenvectors in the complex vector space C”. 


7.2 Hermitian and unitary matrices 


Recall that the dot product of real vectors x, y € R” is given by x-y =x! y 
in matrix form. For complex vectors u,v € C”, the inner product is defined 
by u-v = dv, +--+ +Unvn = ūfv, which involves the conjugate transpose, 
not just the transpose. 


Definition 7.3 For a complex matrix A, its complex conjugate transpose, 
AM. = A’, is called the Hermitian transpose of A. 


Note that A is the matrix whose entries are the complex conjugates of 
the corresponding entries in A. Thus, [AF] 54 = [A];;. With this notation, 
the Euclidean inner product on C” can be written as 


u-v=tlv=u’vy. 


Problem 7.6 Show that (AB)“ = B“A" when AB can be defined. 


Problem 7.7 Prove that if A is invertible, then so is A”, and (A“)~1! = (A7!)#, 


Definition 7.4 A complex square matrix A is said to be Hermitian (or 
self-adjoint) if A” = A, or skew-Hermitian if A” = —A. 


7.2. Hermitian and unitary matrices 285 


For example, the matrices 
2 441 i 1+i 
A= ; d B= : f 
Fe 3 | a ee —i L 
are Hermitian and B skew-Hermitian, respectively. 

Note that, for a real matrix A, A = A means A” = AT. Thus a real 
(skew-)symmetric matrix is (skew-)Hermitian, and conversely. Therefore, 
the notion of symmetry, skew-symmetry, and orthogonal real matrices are 
replaced by Hermitian, skew-Hermitian and unitary matrices, respectively, 
in complex cases. The following theorems list some important properties of 
those matrices. 


Theorem 7.1 Let A be a Hermitian matriz. 
(1) For any vector x € C", x” Ax is real. 


(2) Every eigenvalue of A is real. In particular, an n x n real symmetric 
matriz has precisely n real eigenvalues. 


(3) The eigenvectors of A belonging to distinct eigenvalues are mutually 
orthogonal. 


Proof: (1) Since x” Ax is 1x1 matrix, (x Ax) = (x4 Ax)" = x" Ax. 
(2) If Ax = Ax, then x! Ax = x# Ax = Ax” x = X||x||?. The left side is 
real by (1) and ||x||? is real and positive, because x # 0. Therefore, \ € R. 
(3) Let Ax = Ax and Ay = py with A Æ u. Because A = A” and A is 
real, it follows 


A(x: y) = (Ax) -y = Ax -y = x: (Ay) =x- Ay = p(x- y). 
Since A # u, it gives that x- y = x#y = 0, i.e., x is orthogonal to y. 


In particular, eigenvectors belonging to distinct eigenvalues of a real 
symmetric matrix are orthogonal. 


Remark: Condition (1) in Theorem 7.1 is equivalent to saying that the 
diagonals of A are real: 


Qi Gin Tı 
x# Ax = [ūi > Zn] 
ünl `t ann Tn 
= ò QijLiTj 
13. 


D +C + C, 


a 


286 Chapter 7. Complex Vector Spaces 


where C = X ;<j aijtixj. Since C + C is real, all aj € R if and only if 
x Ax € R for any x € Œ. 


Problem 7.8 Prove that the determinant of any Hermitian matrix is real. 


Problem 7.9 Let x be a nonzero vector in the complex vector space C”, and A = 
xx, Show that A is Hermitian, and find all the eigenvalues and their eigenspaces 
for A. 


It is easy to see that if A is Hermitian, then the matrix iA is skew- 
Hermitian; similarly, if A is skew-Hermitian, then 1A is Hermitian. There- 
fore, the following theorem is a direct consequence of this fact and Theo- 
rem 7.1. The proof is left for an exercise. 


Theorem 7.2 Let A be a skew-Hermitian matriz. 


(1) For any complex vector x #4 0, x" Ax is purely imaginary, and the 
diagonal entries of A are purely imaginary. 


(2) Every eigenvalue of A is purely imaginary. In particular, a real skew- 
symmetric matriz has purely imaginary n eigenvalues. 


(3) The eigenvectors of A belonging to distinct eigenvalues are mutually 
orthogonal. 


Problem 7.10 Prove Theorem 7.2 by using Theorem 7.1, and prove (3) directly. 


Problem 7.11 Show that A = B+iC (B and C real matrices) is skew-Hermitian 
if and only if B is skew-symmetric and C is symmetric. 


Problem 7.12 Let A and B be either both Hermitian or both skew-Hermitian. 
(1) AB is Hermitian if and only if AB = BA. 
(2) AB is skew-Hermitian if and only if AB = —BA. 


Recall that a square matrix Q with real entries is orthogonal if their 
column vectors are orthonormal (i.e., QTQ = I). The same is true for 
complex matrices (compare with Lemma 5.21). 


Lemma 7.3 For a complex square matrix U, the following are equivalent: 


(1) the column vectors of U are orthonormal; 


(2) U#U =T; 
(3) Ut =U"; 
(4) UU# =I; 


(5) the row vectors of U are orthonormal. 


7.2. Hermitian and unitary matrices 287 


Definition 7.5 A complex square matrix U is said to be unitary if it 
satisfies any one (and hence, all) of the conditions in Lemma 7.3. 


A unitary matrix is the complex analogue to an orthogonal matrix, and 
so any unitary matrix preserves the lengths of vectors. 


Theorem 7.4 Let U be an nxn unitary matriz. 
(1) U preserves the dot product on Œ": i.e., for all x andy in C”, 


(Ux): (Uy) =x-y. 
(2) If A is an eigenvalue of U, then |A| = 1. 
(3) The eigenvectors of U belonging to distinct eigenvalues are mutually 
orthogonal. 


Proof: (1) (ux (Uy) = = x" UT Uy = xy; 
(2) For Ux = Ax, x” x = (Ux) (Ux) = |A|’ x" x. 
(3) Let Ue = Ax, ee = HY: and A Æ pw. Since U is unitary, we have 
Aà = 1 = uj, and U-'y = ply = jy. Therefore, 


2 = (Ax)"y = (Ux)"y =x"Uy = x" (ay) = px"y 


holds, and à Æ p implies xy = 0. 


From the same argument as in the proof of Theorem 5.22, U preserves 
the dot product if and only if it preserves the lengths of vectors: ||Ux|| = ||x|| 
for all x in C”. Thus a unitary matrix is an isometry. 


Theorem 7.5 A transition matrix from one orthonormal basis to another 
in a complex vector space ts unitary. 


Proof: Let a={vj, ..., Vn} and 8 = {wi, ..., Wn} be two orthonormal 
bases, and let Q = [qij] be the transition matrix from the basis £ to the basis 
a. By definition, w; = >7/_, qejve. Thus, 


big = (Wi, Wj) = (2 qkiVk, Sauve) 
= DE a Vk; Ve) 
k= = 
YT = Sg" Jir [Q]aj- 
k=1 


288 Chapter 7. Complex Vector Spaces 


This means that the columns of Q are orthonormal and Q is unitary. 


Just like in the real case, it is true that two matrices representing the 
same linear transformation on a complex vector space with respect to dif- 
ferent bases are similar. If the two bases are both orthonormal, then the 
transition matrix is unitary (or orthogonal). 


Problem 7.13 Show that | det U| = 1 for any unitary matrix U. 


Problem 7.14 Show that 


1+i 1+i 

2 2 
A= 

l—i =| 

2 2 


is unitary but neither Hermitian nor skew-Hermitian. 
Problem 7.15 Show that the adjoint of a unitary matrix is unitary. 
Problem 7.16 Show that if A and B are unitary, so is AB. 


Problem 7.17 Describe all 3 x 3 matrices that are simultaneously Hermitian, 
unitary, and diagonal. How many are there? 


7.3 Unitary diagonalization 


In the previous section, it was shown that if an n x n square matrix A 
is Hermitian, skew-Hermitian or unitary, then the eigenvectors belonging 
to distinct eigenvalues are mutually orthogonal. Hence, if such a matrix 
A has n distinct eigenvalues, then there exists an orthonormal basis a for 
C” consisting of eigenvectors of A so that the matrix representation [A]q is 
diagonal, i.e., A is diagonalizable by a unitary matrix. In this section, it 
will be shown that any Hermitian, skew-Hermitian or unitary matrix has 
n orthonormal eigenvectors even if the eigenvalues are not all distinct. In 
particular, it is always diagonalizable by a unitary matrix. 


Definition 7.6 (1) Two real matrices A and B are orthogonally simi- 
lar if there exists an orthogonal matrix P such that P-'AP = B. A 
matrix is orthogonally diagonalizable if it is orthogonally similar 
to a diagonal matrix. 


7.3. Unitary diagonalization 289 


(2) Two complex matrices A and B are unitarily similar if there exists 
a unitary matrix U such that U-'AU = B. A matrix is unitarily 
diagonalizable if it is unitarily similar to a diagonal matrix. 


Concerning orthogonally and unitarily similarity, we begin with a clas- 
sical theorem due to Schur (1909) which is a cornerstone in the study of 
complex matrices. 


Lemma 7.6 (Schur’s lemma) (1) If an n xn real matrix A has only 
real eigenvalues, then A is orthogonally similar to an upper triangular 
matriz. 


(2) Every nxn complex matrix is unitarily similar to an upper triangular 
matriz. 


Proof: We prove only the second assertion (2) by mathematical induction 
on n, because (1) can be done in a similar way. Clearly, it is true for n = 1. 
Assume now that the assertion (2) holds for n = r — 1. Let A be any 
r xr complex matrix and let A, be an eigenvalue of A with a normalized 
eigenvector x. Extend it to an orthonormal basis by the Gram-Schmidt 
orthogonalization, say {x, u2, ..., Ur} for ŒC". Set a unitary matrix U1 = 
[x ug--- u,] with these basis vectors as its columns. A direct computation 
of the product Uy | AU, shows 


U;'AU, = Uff AU, = U¥|Ax Aug --- Au] 


i gl es 

D us se | | | | | 
— | ÀX Aug Saree Au, | 

2 | | | 

T 

A | * 
PANE S 

a B 

O | 


where B is an (r — 1) x (r — 1) matrix. By the inductive hypothesis there 
exists an (r — 1) x (r — 1) unitary matrix Uz such that U, "BU is an upper 
triangular matrix with diagonal entries A2, A3, ++- , Ar. Define 


1 0 
v=0 5 A 


290 Chapter 7. Complex Vector Spaces 


Then it is easy to check that U is also a unitary matrix, and 


Al * 
af = Hh = Al * = 0 A2 
U` AU =U au =| 4 UH BU» = $ 
0 > 0 Ar 


Now, for (1), the same argument can be used in R”. 


Theorem 7.7 If A is either a Hermitian, a skew-Hermitian or a unitary 
matriz, then it is unitarily diagonalizable. 


Proof: By Schur’s lemma, U“ AU = B is an upper triangular matrix for 
some unitary matrix U. However, 


U"(+A)U =+B if A is (skew-) Hermitian 
A _ yrH zy _ 
Be i U-'A-1U = B7! if A is unitary, 


where the right sides of the equalities depend on whether A is either a Her- 
mitian, a skew-Hermitian or a unitary matrix. This means that the upper 
triangular matrix B takes the same type; a Hermitian, a skew-Hermitian or 
unitary, as A. 

Note that B” is a lower triangular matrix and B~! is an upper triangular 
matrix because B is upper triangular. Therefore, the upper triangular ma- 
trix B must be a diagonal matrix in each case of Hermitian, skew-Hermitian 


or unitary. 


Note that, in the similarity condition U-'AU(= U" AU) = D of A 
to a diagonal matrix D through a unitary matrix U, the equation AU = 
UD shows that the column vectors of U constitute a set of n orthonormal 
eigenvectors of A while the diagonal entries of D are eigenvalues of A as 
shown in Theorem 6.1. Therefore, by Theorems 7.1, 7.2 and 7.4, all the 
diagonal entries of D are real, purely imaginary or of unit length depending 
on the types (Hermitian, skew-Hermitian or unitary, respectively) of the 
matrix A. 


Example 7.5 Diagonalize the matrix 


2 1-3 
aaae a 


by a unitary matrix. 


7.3. Unitary diagonalization 291 


Solution: Since A is Hermitian, it is unitarily diagonalizable. One can 
show that A has the eigenvalues A; = 3 and Ay = 0 with eigenvectors 
x; = (1 —i, 1) and x2 = (—1, 1 +i), respectively. Let 


X] 1 : 
uy = — > — (1 — 4, 1), 
Ixl V3 
1 
w = — (-1, 1+i), 


lx v3 


1 i= =i 
u=—| 1 ee 


and let 


Then U is a unitary matrix and diagonalizes A: 


t. i Sap ae ol 
H == 
(ag w enlla a 1 il ï el 


Since all the real symmetric matrices are Hermitian matrices, they are 
unitarily diagonalizable by Theorem 7.7. However, the following theorem 
says more than that. 


Theorem 7.8 For any n xn real matrix A, the following are equivalent. 
(1) A is symmetric. 
(2) A is orthogonally diagonalizable. 


(3) A has a full set of n orthonormal eigenvectors. 


Proof: (1) => (2) : If A is real and symmetric, then it is a Hermitian 
matrix, so it has only real eigenvalues. By Schur’s lemma 7.6, A is orthogo- 
nally similar to an upper triangular matrix, which must be already diagonal. 
Hence it is orthogonally diagonalizable. 

(2) = (3) : If A is diagonalized by an orthogonal matrix Q, then the 
column vectors of Q are n orthonormal eigenvectors of A. 

(3) = (1): If A has a full set of n orthonormal eigenvectors, then these 
eigenvectors form an orthogonal transition matrix Q such that AQ = QD. 
It is now trivial to show that A = QDQ7! = QDQ? is symmetric. 


Hence, all real symmetric matrices are always diagonalizable, even more, 
orthogonally. Now, Theorem 6.5 implies that if A is an eigenvalue of a 


292 Chapter 7. Complex Vector Spaces 


real symmetric matrix A of multiplicity m, then dim FE, = dimN(AI — 
A) = ma. Moreover, symmetric matrices are all that can be “orthogonally” 
diagonalized. 

To obtain n orthonormal eigenvectors, it is good enough to find an or- 
thonormal basis for each eigenspace by the Gram-Schmidt orthogonaliza- 
tion since the eigenvectors belonging to distinct eigenvalues are orthogonal 
already. 

Hence, the procedure for orthogonal diagonalization of a symmetric ma- 
trix A can be summarized as follows. 


Step 1 Find a basis for each eigenspace of A. 


Step 2 Apply the Gram-Schmidt orthogonalization to each of these bases to 
obtain an orthonormal basis for each eigenspace. 


Step 3 Form the matrix Q whose columns are the basis vectors constructed 
in Step 2; this matrix orthogonally diagonalizes A. 


Example 7.6 Find an orthogonal matrix Q that diagonalizes the symmet- 
ric matrix 


Solution: The characteristic polynomial of A is 


ASA -2 2 
det(AI — A) = det 2 à—4 2 | SSOP Aes): 
2 2 à—4 


Thus, the eigenvalues of A are \ = 2 and A = 8. By the method used in 
Example 6.2, it can be shown that 


xı = (—1, 1, 0) and x= (—1, 0, 1) 


form a basis for the eigenspace belonging to A = 2. Applying the Gram- 
Schmidt orthogonalization to {x1, x2} yields the following orthonormal 
eigenvectors (verify): 


7.3. Unitary diagonalization 293 


The eigenspace belonging to A = 8 has x3 = (1, 1, 1) as a basis. The 
normalization of x3 yields u3 = —=(1, 1, 1). Finally, using u1, u2, and ug 


V3 


as column vectors, one can obtain 


abe ees aS S 
ie s 
call), eal ae 
Q=| Bw BI? 
TE cle 
ve VS 


which orthogonally diagonalizes A. (The readers are suggested to verify that 
QT AQ is actually a diagonal matrix.) 


Example 7.7 A matrix 


1 0 0 
A=]0 i 0], &€R with |z| 1, 
0 0 « 


is neither Hermitian, skew-Hermitian, nor unitary, but is a diagonal matrix. 
Hence, there are infinitely many unitarily diagonalizable matrices which is 
neither Hermitian, skew-Hermitian, nor unitary. 

Problem 7.18 For matrices 


: 2i 0 0 
1 i : r 
E a al 


find a unitary matrix U such that UT!AU is an upper triangular matrix. 


Even though symmetric matrices are all orthogonally diagonalizable, cer- 
tain non-symmetric matrices may still have a full set of linearly independent 
eigenvectors so that they are diagonalizable, but in this case the eigenvec- 
tors may not be orthonormal. That is, the transition matrix Q cannot be an 
orthogonal matrix. For example, any triangular matrix having all distinct 
diagonal entries is diagonalizable because their eigenvalues are all distinct, 
but cannot be orthogonally diagonalizable if it is not already diagonal (7.e., 
symmetric). 


Problem 7.19 Show that the non-symmetric matrices 


1 0 ee z 0 1 
(j)A=|0 1 0 (2)A=|0 z+1 0 |,2eER 
00 2 0 0 «+2 


are diagonalizable, but not orthogonally. 


294 Chapter 7. Complex Vector Spaces 


7.4 Normal matrices 


We have seen that Hermitian, skew-Hermitian and unitary matrices are all 
unitarily diagonalizable. However, there are some other kind of matrices 
that can be unitarily diagonalizable, whereas among the real matrices the 
symmetric matrices are the only matrices that are orthogonally diagonaliz- 
able. It turns out that all the unitarily diagonalizable matrices belong to 
the following class of matrices, called normal matrices. 


Definition 7.7 A complex square matrix A is said to be normal if 
AA” = A” A. 


Note that all the Hermitian, skew-Hermitian and unitary matrices are 
normal. But, there are infinitely many matrices that are normal but are 
none of these: 


Example 7.8 The 2 x 2 matrix 


1 3 
A=]. 
is normal, but is neither Hermitian, skew-Hermitian, unitary, nor a diag- 


onal matrix. However, one can easily check that this matrix is unitarily 
diagonalizable. In fact, 


Breet E AL 1 i 1 1] f1i+i 0 B 
Foeke i 1 Le = 0 Nee a ts 


Problem 7.20 Which of following matrices are Hermitian, skew-Hermitian, uni- 
tary or normal? 


Ole kE [i 


1 
1 
ý ETa 
0 i 3 1+72 i 
liai of rf ofi 1 Ji 


As a matter of fact, it will be shown that the normal matrices are all 
classified as the unitarily diagonalizable matrices. We begin with a lemma. 


Oos. O ree 


Lemma 7.9 If an upper triangular matriz T is normal, then it must be a 
diagonal matriz. 


7.4. Normal matrices 295 


Proof: Use induction on k in comparing the diagonal (k,k)-entry of the 
both sides of TT? = T#T: 


t11 tin ti 0 tu 0 t11 tin 


0 tnn tin tnn tin tnn 0 tnn 
If k = 1, the equality 


[TT |i. = tul +-+ tin’, and [T*T]i = tal? 


implies t12 =--- = tin = 0. Inductively, assume that t;_1; =--- = tj_1, = 0 
has been shown for i = 1,...,k. Then 

[TT] = ltk? + ltet? +--+ + tenl?, 

T"T] = ltiel? + ltk-ikl? + teal? = tekl? 
because tıg = --- = tk—iık = 0 by induction hypothesis. But TT# = TËT 
yields tkk41 = ++: = tkn = 0. Thus, this holds for all k = 1,...,n, which 


means that all the entries off the diagonal of T are zero, i.e., T is a diagonal 
matrix. 


Theorem 7.10 For any complex square matriz A, the following are equiv- 
alent: 


(1) A is normal; 
(2) A is unitarily diagonalizable; 


(3) A has a full set of n orthonormal eigenvectors. 


Proof: (1) = (2) : Suppose that A is normal. By Schur’s lemma, there 
exists a unitary matrix U such that T = U" AU is an upper triangular 
matrix. Then T is also normal, since 


Tre = U#AUU#AĦU = U” AAU BU ATAU 
U# AŻ" UU” AU =T#"T. 


Thus by Lemma 7.9, T is already diagonal so that A is unitarily diagonal- 
izable. 

(2) = (3) : It is clear that the columns of the transition matrix U are n 
orthonormal eigenvectors of A. 


296 Chapter 7. Complex Vector Spaces 


(3) = (1) : Let U be the unitary matrix whose columns are the n or- 
thonormal eigenvectors . Then AU = UD or A= UDU", and 


Uput up "ue UDD#uU# = UD#DU# 
UD Url UDUY = AXA, 


AA” 


That is, A is normal. 


Note that there exist infinitely many non-normal complex matrices that 
are still diagonalizable, but of course not unitarily. One can find such ex- 
amples among the triangle matrices having distinct diagonal entries. 

Recall that any n x n real matrix A can be written as the sum S +T 
of a symmetric matrix S = 5(A + A”) and a skew-symmetric matrix T = 
F(A — AT). The same kind of expression is also possible for a complex 
matrix. A complex matrix A can be written as the sum A = Hy + if, 
where 


Hy =5(A+ 4"), Hy =-3(A— A); or ify = 5(A- A¥). 


oi 1 
— 2 2 


Clearly both Hı and Hə are Hermitian, and i Hə is skew-Hermitian. 


Problem 7.21 Show that the matrix A = : | , « ER, is not normal, so it 


cannot be unitarily diagonalizable. But it is diagonalizable. 


Problem 7.22 Determine whether or not the matrix 


is unitarily diagonalizable. If it is, find a unitary matrix U that diagonalizes A. 


Problem 7.23 Let Hı and Hə be two Hermitian matrices. Show that A = Hı + 
iH is normal if and only if Hı Hə = Hə Hı. 


Problem 7.24 For any unitarily diagonalizable matrix A, prove that 
(1) A is Hermitian if and only if A has only real eigenvalues; 
(2) A is skew-Hermitian if and only if A has only purely imaginary eigenvalues; 
(3) A is unitary if and only if |A| = 1 for any eigenvalue A of A. 


7.5. The spectral decomposition 297 


7.5 The spectral decomposition 


As shown in the previous section, the normal matrices are the only matrices 
that can be unitarily diagonalized. That is, A is normal if and only if there 
exists a basis a for C” consisting of orthonormal eigenvectors of A such that 
the matrix representation [A], of A with respect to a is diagonal. 


Theorem 7.11 (Spectral theorem) Let A be a normal matrix, and let 
{ui, Ug, ..., Un} be a set of orthonormal eigenvectors belonging to the 
eigenvalues A1, 2, ..., An of A, respectively. Then A can be written as 


A=UDU#® = dP, + AgP2 +++: + AnPr 


where P; = u;ju¥ is the orthogonal projection matriz onto the subspace 
spanned by the eigenvector u; fori =1, ---, n. 


Proof: Note that U = |u; ue --- un] is a unitary matrix that transforms 
A into a diagonal matrix D, i.e., U-'AU = U# AU = D. Then 


H 
uy 
H uy 
A = UDU" = [Au A2U2 RaR AnUn| . 
H 
un 
= Muu” + Aguouz + ++: + Anun uË 
= AP +AP + +++ +AnPn, 
where 
; .|2 Say se 
Uli |wri| >t UliUni 
H 2 E 3 i 
P; = uu; = [ui Uni] = : : 
> 2 
Uni Unity e [unil 
Clearly, PË = P;. Since uu; = (u;,u;) = 6;;, P? = P; fori = 1 n 
y, t; 5i. i 4j = i; Uj) = lijs ti FG = dyp ey Fee 


Therefore, each P; is the orthogonal projection onto the subspace spanned 
by the eigenvector u; by Theorem 5.13. 


Note that PP; = 0 if i # j. Moreover, for any x € C”, 


(P+ + Pax = Px+ + Px 


n 


y nayi = x = Id(x). 


i=1 


298 Chapter 7. Complex Vector Spaces 


This means that if one restricts the image of the P; to be the subspace 
spanned by u; which is isomorphic to C, then (P,,...,P,) defines an or- 
thogonal coordinate system with respect to the orthonormal basis of eigen- 
vectors {u1, U2, ..., U,} just like (z1,...,2,) with respect to the standard 
orthonormal basis {e;, e2, ..., en} for C” (see Sections 3.3). 

Note that for any x € C”, written as a linear combination of the or- 
thonormal basis u,;’s of eigenvectors: x = )>(uj,x)u;, by the spectral theo- 
rem, 


Ax = A, Pyx + Ao Pox + +++ + Àn PnX 
Ayu, (ux) + e + AnUn (ul x) 
= (u, x)u, + -> +An(Un,x)Un- 


If an eigenvalue A has multiplicity @, t.e., A = An =+ = A,, with a set 
of £ orthonormal eigenvectors uj,, ..., Ug, then they form an orthonormal 
basis for the eigenspace Fy, and 


is the orthogonal projection matrix onto Ey, = N(AI—A). Therefore, count- 
ing the multiplicity of each eigenvalue, every normal matrix A has the unique 
spectral decomposition into the projections 


A = Py, +-+++AgPa,, AF = Vi Py, +--+ ARP, 
for k < n, where ),’s are all distinct. 


Corollary 7.12 Let A be a normal matriz. 


(1) The eigenvectors of A belonging to distinct eigenvalues are mutually 
orthogonal. 


(2) If an eigenvalue A of A has multiplicity k, then the eigenspace N (A — 
AI) belonging to X is of dimension k. 


Corollary 7.13 Let A be a normal matriz with the spectral decomposition 
A= Py, +++» +A,Py,. Then, for any positive integer m, 


A™ = AT Py, Se Ak Pype 
Moreover, if A is invertible, then for any positive integer £ 


1 1 
A% =P +e +o h. 
ee Nn 


7.5. The spectral decomposition 299 


Example 7.9 Find the spectral decomposition of 
| 4 2 2 | 
A=;2 4 2]. 
| 22 4 | 
Solution: From Example 7.6, the spectral decomposition is 
A =2(P, + Po) + 8P3, 


where the projections are 


ihe —1 0 
P = uur =3 1 | [110 ==] -1 10l, 
0 00 
Ba Hj 1 Tan] 
Po = wu =- | -1 al 1255 1 1 SP ay 
|2] haea. ae] 
TE 1 11 
Py = uguy a E 1 1 
1 1 11 
Hence, 
TE ee 
Sa ae aed (oe a 2 el 
-1 -l1 2 


is the projection onto the eigenspace Ey belonging to A = 2, P is the 
projection onto the eigenspace Eg belonging to A = 8, and 


Eti A Soak 8 
A=|2 4 2 zh Re sl ee 


brea eee 


ae | 
jei ji 
eS j 
SS 


0 2 —1 
Problem 7.25 Given A = 2 3 —2 | , find an orthogonal matrix Q that 
—1 -2 0 


diagonalizes A, and find the spectral decomposition of A. 
Example 7.10 Find the spectral decomposition of a normal matrix 
0 0 


A= | 0 


4 
i 0 
a 0 0 


300 Chapter 7. Complex Vector Spaces 


Solution: Since A is normal (AA” = A” A), it is unitarily diagonalizable. 
The characteristic polynomial of A is 


det (AI — A) = (A = i)? (A +4). 


Hence, the eigenvalues are 41 = Ag = i of multiplicity 2 and A3 = —i of 


multiplicity 1. By a simple computation using the Gram-Schmidt orthogo- 
nalization, one can find that 


Pam a re On 


wie ets 2 


are orthonormal eigenvectors of A belonging to the eigenvalues A,, A2 and 
A3, respectively. Now, the spectral decomposition is A = i(P, + P2) — iP, 
where the projection matrices are 


a ieee reali 0 1 | 
Pi) =uuy =5 o poiss pin OF, 
[a] roe 
0 0 0 0 
P> = wu = | 1 | f010=]0 1 0], 
0 00 0 
-1 1 0 -1 
P; =u;uř =- | 0O|[-101])==|] 00 0 
1 -1 0 1 
Hence, 
IRE i 1 0 -1 
T a a a Gaa 
1 0 1 -1 0 1 


Problem 7.26 Find the spectral decomposition of each of the following matrices: 


o f2 1 Of 1 24i 
O25 5) Or armel | 

1 0 0 0 |: 1 1 1 

02 00 1 000 

O 00 2 0? ae 1 0 0 0 

lo 004 PEES 


7.6. Singular value decomposition 301 


Problem 7.27 Let A be a normal matrix. Show that A is Hermitian if and only 
if all the eigenvalues of A are real. 


Problem 7.28 Let A = Pı +- + AxP, be the spectral decomposition of a 
normal matrix A. Show that each P; is a polynomial in A. 


7.6 Singular value decomposition 


In the earlier chapters, we have seen that a matrix A can have a couple of 
decompositions: LU and QR decompositions. There is another important 
decomposition, called Singular value decomposition, which has become 
a fundamental tool in scientific computings. 

Let A be an m x n matrix of rank k. Then A” A is Hermitian with real 
eigenvalues and has a complete set of orthonormal eigenvectors {uj,..., Un} 
for C” so that 

A” Au; = Ai, ul uj = ĝij. 


When A is a real matrix, we take A’ and uf instead of A” and uf.) Since 
t t 


|| Au, ||? = ull A” Au; = Auf u; = ris 


A; > 0 for all ¿ =1,...,m. Thus, we may assume that A1,...,A% are positive 
and the rest n — k eigenvalues Ax+1,...,An are zero, so that the first k 
vectors Uj, ..., Ux are orthonormal basis for R(A) and the rest n — k 
vectors Up41, ---, Uy, also form an orthonormal basis for Eo(A) = N(A) 


since Au; = 0 = Ouy, for j > k. Let o; = VX; and 


1 
Vi = —Auj, or Au; = o;Vv; € C(A) CC”, fori<k. 


Oi 
Then {v1,..., Vk} is an orthonormal basis for C(A) in C” since 
H uf A" Au; Ajufuj 
vi vj = rt = jj- 
Oj0; Oi0; 
Extend this to an orthonormal basis {v1,..., Vk, Vk+1;:--; Vm} for C” by 


Gram-Schmidt process, and let Qı and Qə be the unitary matrices defined 
as: 


Qi = [u ++: un], Q2 = [vi +++ Vn), 
: ges OiVi, i< k, AL Oili, i< k, 
wien Au={ 5 k<i<n, ene k<i<m, 


302 Chapter 7. Complex Vector Spaces 


HH = . See 
Vi OjVi = Oi, if j = 2%, 


0, ipa. Toe 


Since, for each i < k, vi Au; = i 


QF AQI = Q# [Au --» Au, 0--- 0] = QF lovi “++ ov, O---0] 
O1 0 0 0 
= = Uk -E 
0 ok 0 0 
0 0 0 0 > 0 0 
mxn mxn 


The nonzero numbers oj; = V\;’s on the k x k upper left diagonal matrix Xy 

are called the singular values. Since Q; are unitary, we get A = QoEQ?. 
Note that, by multiplying AA” to Au; = o;v;, for i < k, we get A; Au; = 

oj; AA" vj, or 

Au; 


i 


BAP v; = \i = \iVi, i= 1,...,k. 


That is, v; = du., for i < k, are orthonormal eigenvectors of AA” forming 
a basis for C(A). Thus the last m — k vectors Vk41 ++: Vm in Qo form 
an orthonormal basis for N(A¥) = C(A)+, altogether form orthonormal 


eigenvectors of AA. 


Theorem 7.14 (Singular value decomposition) Any m x n matrix A 
of rank k can be factorized into 


H H H 
A=Q.EQy = ovin +:::+oRVEUR , 


where the column u;’s of nxn unitary matriz Qı are the orthonormal eigen- 
vectors of A" A, the k nonzero diagonals of m x n matrix E are the square 
roots o; = VA; of the positive eigenvalues à; of both A" A and AA#, called 
the singular values of A, and the column v;’s of mx m unitary matrix Qo 
are the orthonormal eigenvectors of AA# such that vi = z Au; fori<k. 


Remarks: (1) The singular value decomposition may be considered as a 
generalization of the unitary diagonalization to a non-square matrix. How- 
ever, this singular value decomposition is not reduced to the usual diagonal 
decomposition A = QDQ# even if A is a normal matrix: Since v;’s in Qə not 
only are the eigenvectors of AA” = A” A like u;’s, but also satisfy v; = Au; i 
AĦ A = AA#® does not imply Qı = Qə unless A is a diagonal. Moreover, 
the entries in the diagonal matrix D are eigenvalues of A, but those in Æ 
are real numbers that are square roots of the eigenvalues of A! A. 


7.6. Singular value decomposition 303 


(2) In the singular value decomposition A = Q2 EQ¥ of a matrix A, since 
the upper left k x k corner of E is the only nonzero part, and zeros in the rest 
parts of E kill the last n — k and m — k columns of Q; and Qa, respectively, 
we do not need to find those n — k and m — k eigenvectors belonging to 
the zero eigenvalue for both A” A and AA”. By neglecting those redundant 
parts in the decomposition, the matrix A is actually a product of m x k, 
k x k, and k x n matrices, all of which are of rank k: that is, if Pi and P> 
are the first k columns of Qı and Q2 respectively, then 


A= PPE. 


In this decomposition of A, both P; have their left inverses P“, and Xy has 
its inverse Bes The following example shows a usefulness of the expression 


min{m,n} | 
when k < Thans, 


Example 7.11 (Application to image processing) Suppose a satellite 
takes a picture and sends it to the earth. The picture may contain 1,000 x 
1,000 pixels. Each pixel is a color between black and white, and can be 
coded by a number. Thus there are 1,000,000 numbers to be sent to the 
earth, which are in a 1,000 x 1,000 matrix A. Suppose that some, say 50, 
of the singular values 01, ..., 059 are significant and the rests are extremely 
small, then we get the singular value decomposition 


T T T 
A= PyXU50P; = vjo U] geeta V50050U50; 


in which v;, u; € R'°, i = 1,...,50. Thus, it is good enough to send only 
2 x 50 x 1,000 numbers, 50 columns in each P; and P». 


If A is a square matrix, then we may insert Q# Q1 = I or Q# Qo = I in 
A = Q2EQ" to get 


A = (QQF (Q EQF) = (QEQF (Q2QF). 


Here Q, EQ" = S and QoEQ# = P are Hermitians with nonnegative eigen- 
values, and Q2Q/ = Q is a unitary matrix. Thus, when A is a real matrix, 
the orthogonal part Q represents rotation or reflection and the eigenvalues 
in the symmetric parts S and P represents stretching or compressions, just 
like a complex number z is expressed in polar form as z = zje. 


Theorem 7.15 (Polar decomposition) Every square matrix A can be 
decomposed into A = QS = PQ where Q = QoQ! is unitary and S = 
Qi: EQ? and P = QoEQ# are Hermitians with nonnegative eigenvalues. 
These are called the polar decompositions of A. 


304 Chapter 7. Complex Vector Spaces 


7.7 The solutions of the general systems 


In Section 5.7, we have seen that a solution of the normal equation A’ Ax = 
ATb for a system of linear equations Ax = b gives a general method for 
solving the system when the rank of A is n = the number of columns of 
A. In this case, ATA is invertible and so one can find a unique least square 
solution xy = (A? A)~'A!b. However, in the most general case, the rank 
of A may be less than n so that the columns of A are dependent. Then 
dim V(A) > 0, and neither does the above formula work nor is the least 
square solution unique since all the vectors in x9 + M (A) are solutions. 

In the most general case, the singular value decomposition of A provides 
us a method of solving the general normal equation A4 Ax = A” b. Among 
the many least square solutions, there will be only one least square solution 
with minimum length, which is called the optimal solution, and denoted 
by x+. In the following, we discuss how to find it. 

Since C” = R(A) 6 NA), any least square solution xp may be written 
uniquely as xo = x, + x, with x, E€ R(A) and x, E€ NV(A). Since ||xo||? = 
IIx; ||? + Ixa], ||xol] is minimum when and only when x, = 0. That is, xt 
is nothing but x, = Projpia)Xo € R(A) for any least square solution xo. 
Thus, we are looking for the solutions of 


Axo = be = Proje(4yb, and x" = x, = Projr(4)xo: 


To compute these solutions, take the singular value decomposition A = 
Pod, PE of A. Then the normal equation A? Axo = A“b can be written 
as: 

PIEP} xo = Pil, P3'b, or PË xo = X, P7 b, 
1/01 0 
where De = es . Note that a = {uy,...,u,} forming P; 
0 1/or 
and 8 = {v1,..., Vk} forming P> are orthonormal bases for R(A) C C” and 
C(A) C C”, respectively. Thus 


u# xo vib 
PË xo = ; E E 2 bss : = X; [bd]. 
ul! xo vib 


Thus Pj? xp = X; 'P#b = X7 "[b-]g, and 
P,P xo = (u1-xo)ui +--+ + (uk: Xo)uk = Xp = Projr(4)X0 
Vi: b Vk: b 


t eaters 
o1 Ok 


= PX, Př b uz € R(A). 


7.7. The solutions of the general systems 305 


Such an n x m matrix Pie, pE is called the pseudo-inverse of A, and 
denoted by At = PX, 'P#. 


Theorem 7.16 Let A = P £PE be the singular value decomposition of A 
in a normal equation A” Ax = A"b. Then the unique optimal least square 
solution x* € R(A) in the row space of A can be obtained as 


xt =x, = Pie Př b = Atb. 
Moreover, the set of all least square solutions is x, + N (A). 


If we have started with A = Q2EQ¥ , then we would have ended up with 
At = QI ETQ#, where Et is an n x m zero matrix except X, ' at the upper 
left corner. 

If rank A = n: the columns are independent, then its pseudo-inverse 
At = Q,»,,| P# is the left inverse (A? A)~'A® of A, and if rank A = m: 
the rows are independent, then its pseudo-inverse At = PE Oe is the 
right inverse A4(4A”)~! of A. Thus, if A is invertible square matrix, then 
the pseudo-inverse At and the inverse A~! coincide. 


N(A)~ Crk œ N(AT) x Crk œm 


x, = Atb = At Ax b: = Ax = AAtb 


Figure 7.1: Pseudo-inverse and projections 


en _ ae for 1 <i < k, 
Note that At: C > R(A) C C", and Avi = | 6 ee 
Moreover, the column vectors in P> and P, are also orthonormal eigenvec- 
tors of (At) At and At(At)*", respectively. Thus, it is easy to see that 
(At) tS Bo ee = P,>,P}! = A, and the projections are given as 
follows: 


ATA 


P PË uu” +--+ uu = Projr(a) : Œ > R(A) 


CE 
PoP# = viv” +. + vev = Projeça) : C” > C(A) CC". 


D 
D 

+ 
II 


306 Chapter 7. Complex Vector Spaces 


Therefore, Theorem 7.16 concludes the theory of the system Ax = b 
of linear equations through the figure 7.1 showing the most general and 
complete solution as x, +M (A) where x, = xt = Atb € R(A). 


Example 7.12 Consider a system Ax = b given as —x+y+z=6. 
Then A = [-1 1 1] and the x, in R(A) can be easily found to be 
=) ~1/3 
(—2, 2,2). From the equation x, = 2 | = At(6], we get At = 1/3 
2 1/3 


which is in R(A). 

In fact, the eigenvalue of AA” is 3 and the singular value is v3 and 
P = [1]. PË = [wu]" = -z a wal is an orthonormal basis for R(A), 
and the singular value decomposition of A is 


A = [p111] = RAPF = W3 | -3 ye ye], and 
1 

sale. -1/3 

a = piit = espe | allali] 1 
1 

v3 


Problem 7.29 Find the singular value decomposition and the pseudo-inverse of 
each of the following matrices: 


7.8 The eigenvalue problem 


We have seen that the eigenvalues of a matrix play very important roles in 
solving problems. However, as the roots of the characteristic polynomial, the 
computations are not easy at all as we know that there is no general formula 
for the roots of a polynomial of degree higher than 4, and it is not possible 
to calculate the roots exactly in a finite number of steps. Even expanding 
the determinant det(A— AJ) into a polynomial in A is not simple. Therefore, 
we need some practical methods to compute the eigenvalues without solving 
the characteristic equation. 

In this section, we introduce some iterative methods for practical compu- 
tations. Many, perhaps most, eigenvalue problems which occur in applica- 
tions involve symmetric matrices. In those cases, the eigenvalue problems are 


7.8. The eigenvalue problem 307 


so much easier to solve, in part because we can avoid complex arithmetics, 
but also for some other reasons. Note also that any given polynomial can be 
the characteristic polynomial of the companion matrix introduced in Sec- 
tion 6.3.1 (see Lemma 6.11), so that the methods maybe used to compute 
the roots of a polynomial in general. 


7.8.1 Jacobi method 


Even if this method of Jacobi for symmetric matrices is not fastest, it has 
some advantage that it is simple to program and to analyze, and it is no less 
stable than the more sophisticated methods. 

This method is to construct a sequence of similarity transformations: 


Anyi = Qn AnQn: with Ao = A, 


with the Givens rotation matrices Qn, defined in Section 5.11. Clearly, this 
similarity preserves the eigenvalues of A, and as we have seen in that section, 
by suitable choices of c and s in Qij, we want to have QGAQij produce 
zeros in both positions (i,j) and (j,i). In fact, by a direct computation of 
B= Qi, AQij, we get 


= a S 2 2yh 
bij = cs(ajj— ai) + (c = s°)aji 
2 2 
bii = C Ay S Ajj 2csajj 
Bed ES eStats A es m 
bjj = say +C ajj — 2c8a;;. 


By setting bij = 0 = bji, 


| oe: Qii — Ajj 
= = 28: 
Cs Qji B 
Since c= V1 — s?, 
1 1 5 
4 2 2 2 
8 —s* + —— =O, or sl [= -tll 
4+ af? 2 ./1+ 6? 


Since any solution will suit for us, we take 


j 1/2 i 1/2 
1 3 gel i E 
s= | =- OH ; = | -+ —_ . 
2 V14+ p? 2 V14+ p? 
Then we will have a zero in any specified off diagonal position, while pre- 
serving the eigenvalues of A. 


308 Chapter 7. Complex Vector Spaces 


It would be nice if we could make zeros all the off diagonal entries of A in 
succession, and after n(n) transformations we have a diagonal matrix that 
is similar to A, whose diagonals are the eigenvalues of A. Unfortunately, this 
is not the case we have here. When we make a new zero in an off diagonal 


position, a previous zero may become nonzero, that is, every time we knock 
one off diagonal entry out, others pop back up. Fortunately, although the 
transformations never make a diagonal in a finite number of steps, they do 
make steady progress toward that goal: 


Theorem 7.17 In the transformation B = QTAQij of a symmetric matriz 
A so that bij = 0, the sum of the squares of the diagonal entries increases 
by 2a%,, while the sum of squares of the off diagonal entries decreases by the 
same amount. 


Proof: Since c? + s? = 1, using the equations of bij above but by messy 
exercise, we can have 

since bj; = 0. Since only two rows 7-th and j-th and two columns i-th and 
j-th of A are changed during a Jacobi transformation, aj; and ajj are the 
only diagonal entries to change. That is, 


n n 
De os, 2 2 
Yo bin = 205; + Y ake 
k=1 k=1 


We next show that the sum of squares of all the entries of B = QGAQij 
does not change. Let ay and pẹ be the k-th rows of A and P = Q74, 
respectively. Then only the i-th and j-th rows of A and P are different: 


P; = ca; + saj, Pj = —S$aj; + caj. 
Thus 
T T a adasa aT i gelaat i baa aE E E EE N 
PiP; +P;P; = caja; +2csajaj + s°aja; + s'aja; — 2scajaj +c ajaj 


= aja; cle ajaj. 
Similarly, the columns of P and B = PQij = QGAQij are the same except 
for l l l 
b’ = cp' + sp’, b = —sp’ + cp’. 
By the same computations, the sum of the squares of the entries does not 
change. 


7.8. The eigenvalue problem 309 


In practical computation, this theorem suggests that it may be a wasting 
of time if we try to make zero some entries which are already very small. 
That is, we may need some strategy for determining which entry a;; to make 
zero, and in what order. For this we may check all the off diagonal entries in 
asystematic order, and make zero only those entries whose squares exceed 
half the average of the squares of all off diagonal entries. The following 
theorem helps to determine the rate of convergence: Let [k;;] denote the 
matrix obtained from A after k iterations of the Jacobi method, and 


ek = A ki. 


VAS 
Theorem 7.18 Suppose that, at k + 1-st step, we make zero the entry kij 
such that ki. > FAG Then the convergence criterion 
EDID 
VAS LJ 


is satisfied after at most k +1 = n? Int iterations. 


Proof: By a direct computation: 


ek 1 
a N a 
a n(n — 1) < exl a 


2 2 2 
epe!” < egle!” jkt = ege~ (Rt1)/n 


2 


IA 


af 
ege 2) = eeo 


> 2 at X 2 
€ 0; =€ Qij- 
tJ tJ 


IA IA 


In summary, for a symmetric matrix A, at each step of Jacobi iteration, 
choose Qn as in Theorem 7.18 for some i, j. Then Agi = QT AkQk =D 
with A; = A, where D is nearly diagonal. Here, we take 


DOF OTAGO: ut Oa Gey, 


so that D = QT AQ or AQ = QD. That is, the columns of Q are the or- 
thonormal eigenvectors of A corresponding to the eigenvalues in the diagonal 
of D. 


310 Chapter 7. Complex Vector Spaces 


7.8.2 QR method 


The Jacobi method can be modified to knock out aj; by similarity even 
when A is not symmetric, and by iteratively knocking out the lower part 
of the diagonal it may be transformed into an upper triangular form whose 
diagonals are the eigenvalues of A, but certainly not in a finite number of 
steps, and not even in the limit. Indeed, if the nonsymmetric matrix has any 
complex eigenvalues, there is no hope of convergence to a triangular form, 
since real matrix multiplications can not produce complex diagonals. 

If we are willing to accept an upper triangular matrix with one extra 
diagonal just below the main diagonal (named subdiagonal) which is called 
an upper Hessenberg form, we can achieve this in a finite number of steps. 
If A is symmetric, the final Hessenberg form will be also symmetric, and so 
it becomes a tridiagonal form. Once we reduced A to a similar Hessenberg 
form, to find the eigenvalues of A we still have work to do. 

(1) We use again Givens rotation matrices Q;;, but this time not to zero 
aji but a; ;-1. Note that pre-multiplication of Qi; to A modifies the i-th 
and j-th rows of A and post-multiplication of Q;; modifies the i-th and j-th 
columns. Thus, if we choose c and s so that the position (j,i — 1) is ze- 
roed by pre-multiplication, it will not be altered by the post-multiplication. 
However, since j > 4, only a; ;_-1 for j = i+ 1, ..., n can be eliminated, 
but not a1 on the first subdiagonal. This is the price we have to pay 
to keep the zeros created by the pre-multiplication from destruction in the 
next post-multiplication. 

From the computation of B = QT AQij, we get 


2. 22 
bj i—1 = —8Qji-1+Cajj-1, C+s° =1, 


and by setting this equal to zero, we get 


Qj i—1 jis Qj i—1 
2 23 na FB 2 S2 
(af i1 + 5 i1) / (a5 5-1 +45 5-1) / 


— 


Of course, if aj ;-1 = 0 already, we do not need to do this. To keep the 
zeroed entries to remain zeros, we do this column by column and from the 
first to (n — 2)-nd columns. With this order, one can easily ensure that 
zeros created earlier do not destroyed by the later processes. By the same 
arguments, Theorem 7.17 is still works for this case. 

(2) The symmetric orthogonal Householder matrices may also be used 
to transform a general real matrix to Hessenberg form introduced in Sec- 
tion 5.11. Since cs = HY = Hj, we take Api, = H;A,H;. The pre- 
multiplication by H; changes from i-th to n-th rows of A and zeros from 


7.8. The eigenvalue problem 311 


i + 1 to n entries of the pivot column, which is chosen to be i-th column, 
while post-multiplication by H; to preserve the eigenvalues changes from 
i-th to n-th columns of A, and so will destroy the zeros in the i-th column 
created by pre-multiplication. Hence, just like the Givens rotation matrices 
method above, we take H; to zero the 7 + 1-st to n entries of the i — 1-th 
column, which will be safe during the post-multiplication process. Thus, 
a; ;-1 cannot be zeroed as before. 

(3) The elementary matrices in the Gaussian elimination may also be 
used to transform a general real matrix to Hessenberg form: recall that an 
elementary matrix /;; and its inverse are of the form: 


0? 0 0? 04 
0; 1 0 0; 1 0 
=f 
Ey = i E; = 
0j r 1 0; -=r 1 


Each transformation takes 
Ag+ = Bij’ AkEij. 


The pre-multiplications by E,;''s may reduce A to an upper triangular form, 
but the post-multiplication by Fij put it back. Thus, choose E; so that it 
zeros a; ;¿—1 rather than aji. Thus we take 

pZ Qj an 

Qi i-1 

which works for our purpose. We can now proceed to reduce A to an upper 
Hessenberg form by eliminating subdiagonal entries column by column be- 
ginning with the first column. In this case, as we might expect, we have to 
be careful with pivoting: t.e., exchanging i and j-th rows may be necessary. 
This exchange of rows is equivalent to pre-multiplying a permutation matrix 
to A, and so to transform A by a similarity, we need to post-multiply the 
same permutation matrix, which exchanges 7 and j-th columns. This does 
not destroy the zeros created earlier, in columns 1 to 2 — 1. 


Now we discuss how to extract the eigenvalues from this similar upper 
Hessenberg (or possibly tridiagonal) matrix to A. Most of the methods dis- 
cussed here could be applied to the original matrix A directly, but applying 
to this Hessenberg matrix is much more efficient. 


312 Chapter 7. Complex Vector Spaces 


(1) A popular and powerful method is the QR-method: First we take 
a QR-decomposition of A: A = QR where Q is an orthogonal matrix and 
R is an upper triangular matrix (see Sections 5.10 and 5.11). 


Theorem 7.19 Let the eigenvalues of a real matrix A be written as 
Ai] > +++ > [àn] > 0. 
Let A= Ag = QoRo. Define Ay = RoQo, and if Ak = aQk Rp for k > 1, 
Ak+1 = RkQk- 


Then, for large k, Ak converges an upper triangular form. 


Proof: Note that all the Ax’ are similar to A, since 
Qo A0Qo = Qo ' (Qo Ro)Qo = RoQo = Ai. 
Similarly, we have Ay = Q; 11 Ak-1Qk-1 = Qg 1 Q0 AoQo -+ Qe-1, Or 
Qo +++ Qk-1Ak = AQo -+ Qk- 

Thus, from this equation and Ak = Qk Rk, we get 

Qoc Qk-1Rk-1 Ro = (Qo Qp—2Ag—1) Re—-2 +++ Ro 

= (AQo--- Qk-2)Rk-2* + Ro 

AQo +++ Qk-3Ak-2Rp-3 +++ Ro 


= A’Qo:--Qr-3Rpk-3 + Ro 
. = AÈ. 


Since A has all distinct eigenvalues, it is diagonalizable by P whose columns 
are the eigenvectors of A: A = PDP}, where the diagonals of D are 
arranged in descending order. Take the LU-decomposition of P~' = LU, in 
which no row exchanges are assumed since, P~! is nonsingular, zero pivot is 
unlikely to arise (even if row exchange is necessary, there is a way to continue 
this proof). Then 


Qo ++: Qk-1Rp_1+++ Ro = AF = PDP = PD*LU. 
Since A is assumed to be nonsingular, so are A* and R,;’s. Thus, 


P'Qo-:-Qe1 = D LUR" Ry, = Hk, 
(DEED) DIU Rye Rpt}; 


7.8. The eigenvalue problem 313 


and so 


Ak = Qu QT oQ Qk 
(PH,) A(P Hp) = Hy" (PAP) Hy = Hg DH- 


However, the matrix D‘LD~* in H; takes the following form: 


1 0 0 
k 
bay a): 1 0 
Ditp*=| ta (#) @() 1 0 


AQ 
k k k 
mlu) wl to (34) 


Since A; are in descending order, the entries in the lower triangular part 
are all tend to zero, so that the matrix converge to I as k + oo. That is, 
Aye DUR," ee Re for large k, which is an upper triangular form, and 
so is A* = H; DH}. 


Intuitively, A, = Q; Ak 1Qk—1 is designed so that Q; Akı = Rki 
becomes an upper triangular, while the post-multiplication of Qk—1 destroys 
this triangular structure. That is, while the pre-multiplication is shovelling 
the nonzero material below the diagonal over into the upper triangle system- 
atically and purposefully, the post-multiplication is scattering the material 
about aimlessly. However, the directed efforts of pre-multiplication gradu- 
ally out over the undirected efforts of post-multiplication. 

The rate of convergence to zero of the lower triangular part depends on 
the magnitude of the ratio Pal, To speed up the convergence, we employ 


a 


a shifted QR method: compute first that Ak — a,J = Qk Rk, and then set 


Agi = RkQk + onl = Q; (An — onl) Qe + ont = Qe! AnQe- 


That is, this shifted QR method with a constant a is applying the ordinary 
QR method to B = A—al. This shifts the eigenvalues from A; to A; — a, 
and by choosing a close to an eigenvalue A; we can make the i-th row of 
the lower triangular part of D*LD~* converge to zero rapidly. By cleverly 
varying a@,, we can speed selected subdiagonal entries on their way to zero. 

The philosophy behind the shifts is easily seen by looking at the expan- 
sion for D‘LD~* given earlier, the (n,n) entry of Ax is the first to approach 


314 Chapter 7. Complex Vector Spaces 


an eigenvalue. Thus, this entry is the simplest and most popular choice for 
the shift a,. Normally its effect is to produce quadratic convergence to the 
smallest eigenvalue. After three or four steps of the shifted algorithm, the 
matrix A, looks like: 


* 


Ak = L E<<l, 


* 
* 
* 
E 


Co a: * * 
ox * * 


Xn 


where A’, is a very close approximation to the true An. To find the next 
eigenvalue, the shifted QR method continues with the smaller (n—1) x (n—1) 
submatrix in the upper left corner, etc. 

For a Hessenberg matrix the process takes O(n”) operations, while, for a 
tridiagonal matrix, it takes O(n). Note that each new Ax keeps Hessenberg 
or tridiagonal form. 


(2) LU method for extracting the eigenvalues from an upper Hesenberg 
or triangular form is also available. We first take an LU decomposition of 
A by Gaussian elimination. Then take 


Ak+1 = Up Ly = Ly AnLr, if A, = L, Us. 


The the convergence of A, to an upper triangular form follows equally well 
from Theorem 7.19, in which orthogonality of Qk is used nowhere. 


7.8.3 The power and inverse power method 


In applications, sometimes we may only be interested in one or a few of the 
eigenvalues of A: usually the largest or the smallest one. In those cases, the 
power or inverse power method may be more appropriate than the previous 
methods in earlier sections. The power method is very simple: just pick a 
starting vector xo and repeat multiplying it to A. The following theorem 
states that, unless we are very unlucky in the choice of xg, this power 
method find the eigenvalue of A of largest modulus. 


Theorem 7.20 Suppose that A; is the eigenvalue of A of greatest in abso- 
lute values. Let Xk}1 = Axz. Then for arbitrarily chosen vectors xq and 
y, 

yT Axg 


A —> àı, in probabilty 1. 
yY Xk 


7.8. The eigenvalue problem 315 


Proof: (1) Suppose A is diagonalizable: A = PDP~', where the absolute 
values of the eigenvalues of A in the diagonal of D are in descending order. 
Then 


k 
E= lim >= 
k= Mt 
Ms 
At 1 
AÏ 
E 1 
1 ak 
lim 2 = 0 
k= rE 
0 
A5 
aF 


On the other hand, limk a = PEP! = G, which is not the zero 


1 
matrix since otherwise E = P-'GP = 0. Thus, dim N (G) < n — 1. Hence, 
if xo Z N(G), and y £ Gx, which are possible in probability 1, so that 
y! Gx Æ 0, then 


lam V Aak eg A 
im = lm ——— 
k>% y! x, k>œ yl Akxg 
= yT (A*t1x9/APt) _ y! Gxo = 
= Mara ak A, = F Ay = Aq. 
k>œ oy!’ (Ahxo/A*) y+ Gxo 


(2) Suppose A is not diagonalizable. Take the Jordan canonical form 
J, which will be discussed in the next Chapter, such that A = PJP}, in 
which the eigenvalues of A in J are in descending order and the size of the 
blocks belonging to A; are also in descending order. Then 


Jt 
At = PJP! = P He 
JE 


S 


where J* are the powers of the Jordan blocks introduced in Theorem 8.1, 
and is of the form given in Section 8.3.1. If œ is the order of the £ largest 
Jordan blocks of A1, the fastest-growing entries in JE are in the upper right 


316 Chapter 7. Complex Vector Spaces 


corners which are of order O(k®—!A*). Thus we divide J* by k0-!A* and 
get 


E 
JE E; 


= lim ——— = 
He see ka-1 Ak 0 i 


0 


where every E; has zero entries except for the one entry at the upper right 


hand corners. Thus, 
Ak 
lim —— = PEP! =G. 
k= ko-1 yk 


Since F is non zero, so is G, and so we choose xp and y as in case (1) with 
probability 1. Hence 


lim = lim —— 
k>% y! Xz k> yl Akxg 


yT (APT x9 /(k +1) 1771) (: + i . 
ee Te a ae a ee 1 


= ii 
k-to0-y! (AFxg/k@— IAF) 


Therefore, for large k, y? Ax, © Ary! x, for arbitrary y. That is, Ax, © 
À1Xķk, and so xx is an approximate eigenvector belonging to 41. Thus, We 
normally choose y = x. If A is symmetric or A; is a simple root, the 
most slowly decaying component in the limit equation TEF goes to zero 


is [A2/A1|¥. If Ai occurs in blocks of size œ > 1, the most stubborn elements 
go to zero much more slowly, like 1/k. 

When A has a complex eigenvalue, or we want to find the smallest eigen- 
value of A, we may use the inverse A~! instead of A: 


—1 
Xk+41 SÁ Xk, Or ÅXk41 = Xk. 


Theorem 7.21 Suppose A has an eigenvalue A, which is closer to p than 
all other eigenvalues, and let 


(A — pI)XK+41 = Xk- 


7.9. Exercises 317 


Then one can choose randomly xq and y in probability 1 such that 


Proof: Since x41 = (A —pl)~!xx, by Theorem 7.20, 


T 
y" xi 
yT Xk 


converges to the eigenvalue of (A — pl)~' of largest modulus. But the 
eigenvalues of (A — pl)~! are EEE for eigenvalues 4; of A. Thus, the 


largest eigenvalue of (A — pI)~! is OT PE , and so 
T T 
x 1 x 
Y Xk+1 yY Xk 


ea A 
y! xp (Ap — p)’ YT Xk41 P 


If p = 0, the inverse power method x41 = A ~!x, allows us to find 
the eigenvalue of A which is the smallest in modulus (close to 0), and if p is 
chosen to be a complex number, it can even converge to complex eigenvalues. 

Using the shifted inverse power method in Theorem 7.21, we can find 
any eigenvalue of A by choosing p to be closer to the desired eigenvalue than 
any others. However, we do not usually know exactly where the eigenvalues 
are, this process may be somewhat like going fishing. The closer we choose 
p to an eigenvalue, the faster the convergence. Hence,we can update p in 
each step of the iteration and set it equal to the current best approximation 
of Ap. 


7.9 Exercises 


7.1. Calculate ||x|| for 


oae masla 


7.2. Construct an orthonormal basis for C? from {(i, 4 + 2i), (5 + 6i, 1)} by 
applying the Gram-Schmidt orthogonalization. 
i 1 1—-i Lti 
7.3. Find the rank of the matrix A = l—i l1+i 1 2+1 
tpai 1-i 2-71 1+4i 


318 


7.4. 


7.5. 


7.6. 


7.7. 


7.8. 


7.9. 


7.12. 


Chapter 7. Complex Vector Spaces 


Find the eigenvalues and eigenvectors for each of the following matrices: 


—2 -1 0 4 
Ori vale. Sve eb 
5 —5 —9 0 —i 0 
B) l-1 4 2|, @li li 
3 —5 -3 0 —i 0 
e a l 
VB V2 
Find the third column vector v so that U = 5 0 v | is unitary. 
1 1 
vee | 


How much freedom is there in this choice? 


Find a real matrix A such that A+ rI is invertible for all r € R. Does there 
exist a square matrix A such that A + cl is invertible for all c € C? 


Find a unitary matrix whose first row is 
(1) k(1, 1— i) where k is a number, (2) ($, 4, 4). 


Let V = Œ with the Euclidean inner product. Let e the linear trans- 


: | with respect 


T b 

; 1 

formation on V with the matrix representation A = | 1 
d a set of orthonormal 


to the standard basis. Show that T is normal and fin 
eigenvectors of T. 


Prove that the following matrices are unitarily similar: 
| cos@ —sind 


e? 0 
ne. epee | ; | e~i0 | , where @ is a real number. 


0 


. For each of the following real symmetric matrices A, find a real orthogonal 


matrix Q such that Q¥ AQ is diagonal: 


oh eel 


. For each of the following Hermitian matrices A, find a unitary matrix U such 


that U¥ AU is diagonal. 


1 a z+ 
(1) E T (2) P i e Sh 6 Behe 
: ‘ 2-4 14+i 2 


Find the diagonal matrices to which the following matrices are unitarily sim- 
ilar. Determine whether each of them is Hermitian, unitary or orthogonal. 


1 i 0 
Lf iti 1-i 0.6 —0.8 
wlis eal (2) be ae (3) a 2 i 


. For a skew-Hermitian matrix A, show that 


(1) A — I is invertible, (2) e4 is unitary. 


7.9. 


7.14. 


7.15. 


7.16. 


7.18. 


7.19. 


Exercises 319 


Let U be a unitary matrix. Prove that U and U¥ have the same set of 
eigenvalues. 


2 


Verify that A = | i : | is normal. Diagonalize A by a unitary matrix U. 


2 


Show that the non-symmetric real matrix 


1 0 0 
A= 1 3 3 | can be diagonalized. 
—2 —4 -5 


(1) Prove that A is normal if and only if there is a polynomial g such that 
A# = g(A). 

(2) Suppose that A is normal and B is a square matrix such that AB = BA. 
Show that 47 B = BA". 


Suppose that A and B are normal n x n matrices. Prove that AB = BA if 
and only if A and B can be diagonalized simultaneously by the same unitary 
matrix U, i.e., UĦ AU and U" BU are diagonal matrices. 


A circulant matrix of order n is an n x n matrix A in which the i-th row is 
obtained from the first row by i — 1 cyclic shifts. Thus it takes the following 
form: 


ay a2 a3 °°" an 

an Qi G2 ‘t° Gn-1 
A= Qn—-1 An Q1 ``’ GAn-2 

a2 a3 a4 EEA ay 


(1) Show that any circulant matrix is normal. 


(2) Find all eigenvalues of the n x n circulant matrix 


Oo Te 08 wee. 60 
0 0 Te Q 
W=]: TE 
000 == 1 
Pog gara 10 


(3) Use your answer to find the eigenvalues of 


01 1 > 1 
1 0 1 > 1 
1 1 0 1 
1 1 1 0 


(4) Find all eigenvalues of the circulant matrix A by showing that 


320 


7.20. 


7.21. 


7.22. 


7.23. 


Chapter 7. Complex Vector Spaces 


n 
A= 5 a;wi-), 
i=1 


(5) Show that 


n—1 
det A = [[@ + agw; +--+ Anus"), 
j=0 
where wj = e?74/", j =0,1,...,n — 1 are the roots of the unity. 
2 1 1 
Find the spectral decomposition of A= | 1 2 1 
1 1 2 


Show that a matrix A is nilpotent, i.e., A” = 0 for some integer n > 1, if 
and only if its eigenvalues are all zero. 


Every n xm matrix A is decomposed into a product LU where the k columns 
of n x r matrix L form a basis for the column space of A and the k row of 
k x m matrix U forms a basis for the row space of A (see Problem 2.24). 
Show that 

At = U” (UUP) HL” L) LF. 


Determine whether the following statements are true or false, in general, and 
justify your answers. 
(1) A Hermitian matrix is always unitarily similar to a diagonal matrix. 
(2) An orthogonal matrix is always unitarily similar to a real diagonal ma- 
trix. 
(3) For an nxn square matrix A, AA” and A” A have the same eigenvalues. 
(4) If a triangular matrix is similar to a diagonal matrix, it is already 
diagonal. 
(5) If all the columns of a square matrix A are orthonormal, then A is 
diagonalizable. 
) Every permutation matrix is diagonalizable. 
) Every permutation matrix is Hermitian. 
) A nonzero nilpotent matrix cannot be Hermitian. 
) Every square matrix is unitarily similar to a triangular matrix. 
0) If Ais a Hermitian matrix, then A+ iZ is invertible. 
) If A is a real matrix, then A + iJ is invertible. 
) If A is an orthogonal matrix, then A + $J is invertible. 
) Every unitarily diagonalizable matrix is Hermitian. 
) Every diagonalizable matrix is normal. 


Chapter 8 


Jordan Canonical Forms 


8.1 Introduction 


Most problems involving linear systems of equations can be easily handled if 
the matrices in the systems are diagonalizable as shown in previous chapters. 
However, for a non-diagonalizable matrix A, it is not easy, for example, to 
compute the exponential matrix e^, or A”, so a general solution x(t) = 
e'4c of a system x'(t) = Ax(t) of linear differential equations may be not 
easily found. In this Chapter, we will discuss how to handle such a non- 
diagonalizable matrix. 

Recall that an n x n matrix A is diagonalizable if and only if A has a 
full set of n linearly independent eigenvectors. Hence, if 41, ..., Ag are all 
distinct eigenvalues of a matrix A with multiplicities maj, ..., Map, respec- 
tively, then Theorem 6.5 says that A is diagonalizable if and only if 


dim Ey = my,, forall i = 1,...,£, 
or Œ = Ey, D Eye 


Therefore, a matrix A is not diagonalizable if and only if A has an 
eigenvalue A with multiplicity m) > 1 such that 


1 < dim F) < my), 


or the number of linearly independent eigenvectors belonging to A is less 
than m). In this case, it is not possible to find a transition matrix Q such 
that Q-'AQ = D since the columns of Q must be n linearly independent 
eigenvectors. 

However, Theorem 8.1 below is a remarkable theorem in linear algebra: 
it says that any matrix, even a non-diagonalizable matrix, is similar to a 


321 


322 Chapter 8. Jordan Canonical Forms 


matrix very “close” to a diagonal matrix, called a Jordan canonical form. 
The transition matrix Q consists of a maximal set of linearly independent 
eigenvectors and some more vectors, so called generalized eigenvectors, which 
fill up the deficient vectors in Q. The proof of the theorem may be beyond 
a beginning level. 


Theorem 8.1 If a square matrix A of order n has s linearly independent 
eigenvectors, then it is similar to a matrix J of the following form, called 
the Jordan canonical form, or a Jordan canonical matrix. 


Ji 0 
J = Q7!AQ = , ; 
0 J 


in which each Ji, called a Jordan block, is a triangular matrix of the form 


4 1 0 
Ji = er ae) 

ni, <3 

0 di 


where A; is a single eigenvalue of A. 


Leaving the proof of this theorem to some other advanced linear algebra 
books, we just illustrate how to find the Jordan canonical form and the 
transition matrix by observing the statements in the theorem. 


Remark: (1) For a given matrix A, its Jordan canonical form J is com- 
pletely determined by the maximal number of linearly independent eigen- 
vectors of A: the number of the Jordan blocks in J is equal to the maximal 
number s of linearly independent eigenvectors of A, each of which is as- 
sociated with a Jordan block whose order is the same as the ‘rank’ of the 
eigenvector (which will be discussed in Section 8.2). 

Thus the number of Jordan blocks with the same eigenvalue is equal 
to the dimension of the eigenspace Ey, = N(AI — A): the number of linearly 
independent eigenvectors of A belonging to A, and the sum of the orders of 
all Jordan blocks associated with an eigenvalue A is equal to the multiplicity 
mM of à. 

(2) In particular, if A has a full set of n linearly independent eigenvectors, 
then there are n Jordan blocks, each of which is a 1 x 1 matrix, so that the 


8.1. Introduction 323 


Jordan canonical form is just the diagonal matrix with eigenvalues on the 
diagonal. That is, a diagonal matrix is a particular case of the Jordan 
canonical form. 

(3) In the transition matrix Q, the deficient column vectors for each 
eigenvalues with the multiplicity m) > 1 are filled up by, the so called, gen- 
eralized eigenvectors, which will be discussed in the next section. Assuming 
Theorem 8.1, one can first determine the Jordan canonical form of A as 
shown in the following, and then try to find the generalized eigenvectors as 
discussed in the next section. 

(4) If we put the columns of Q in reversed order, then one can easily see 
that Q~-'AQ = JT, which is also called the Jordan canonical form. However, 
J and J" are similar to each other by the permutation matrix. Thus we say 
that any (complex) square matrix A is similar to a unique Jordan canonical 
matrix up to a permutation matrix. In this sense, it is called the Jordan 
canonical form of a matrix A. 


Example 8.1 For a5 x5 matrix A that has a single eigenvalue à of multi- 
plicity 5, classify all possible Jordan canonical forms of A up to permutations 
of the Jordan blocks. 


Solution: (1) If A has only one linearly independent eigenvector belonging 
to A: ie., dim E) = 1, then the Jordan canonical form of A has only one 
block of the form: 


JY = Q7!AQ = 


oooo! 
ooo yr 
co }yr OC 
o-rrR CO oO 
~rRocoe 


(2) If it has two linearly independent eigenvectors belonging to A: i.e., 
dim E, = 2, then its Jordan canonical form has two blocks, either one of 
the forms: 


à 1 x 
0X `à100 

JO = » 1 0 or J®) = 0 rA 1 0 
Or et 00 A1 

00A 000A 


These two matrices J?) and J) cannot be similar, because (J) — AT)’ = 0, 
but (J@) — AI)? Æ 0. (One can justify it by a direct computation). 


324 Chapter 8. Jordan Canonical Forms 


(3) If it has three linearly independent eigenvectors belonging to À: i.e., 
dim FE, = 3, then its Jordan canonical form has three blocks, either one of 
the forms: 


À A 


A 
0 


oo» 


1 
1 À 
A 0 


Again, these two matrices J“ and J©) are not similar, because (J(® 


AI)? = 0, but (J — AN)? 40. 
(4) If it has four linearly independent eigenvectors belonging to à: i.e., 
dim F, = 4, then its Jordan canonical form has four blocks of the form: 


| * | 
À 
àA 1 
0 Aà 
(5) If it has five linearly independent eigenvectors belonging to à: i.e., 


dim FE, = 5, then its Jordan canonical form has 5 blocks, each of which is 
just 1 x 1 matrix: i.e., a diagonal matrix, 


A 


Thus, seven Jordan canonical forms are possible. Note that all of these 
seven possible Jordan canonical matrices have the same trace, determinant, 
characteristic polynomial, and eigenvalues as those of the matrix A, but no 
two of them are similar to each other. 


As shown in the case (2) (also in (3)) of Example 8.1, when dim F) = 2, 
two nonsimilar Jordan canonical matrices J?) and J) were possible. To 
determine the right Jordan canonical form of the given matrix A, one has 
to look at the ranks of (A — AJ)* for k = 1,2,...,n, as followings: 


8.1. Introduction 325 


Example 8.2 (The orders of Jordan blocks) Let A be a matrix with two 
distinct eigenvalues À and p, that is similar to the Jordan canonical matrix 
J with four Jordan blocks: 


A 1 0 
0 A 1 
0 0 A 


This shows that A is of multiplicity m), = 7, and has 3 linear independent 
eigenvectors, while u is of multiplicity m, = 1. Then, 


Loo 0 
lo 0 o| 


(J-A? = k: 5 | | 
[o a| 
[r°] 


where T = u — à Æ 0. Hence, if we write n — ma = c\(= 8 — 7 = 1), then 
rank(J — AI)’ — c) = 1 — 1 = 0 and rank(J — AI)? — c = 2 — 1 = 1 implies 
that there is only one largest block of order 3. Note that rank(J—AI)—c, = 
5 — 1 = 4 is equal to twice the number of blocks of order 3 plus the number 
of blocks of order 2, so that there are two blocks of order 2. Finally, the 
number of blocks of order 1 is 7 — (2 x 2) —3 = 0. 


In general, if A is an eigenvalue of multiplicity m, of an n x n matrix A 
and J is its Jordan canonical form, then rank(A — AJ)* = rank (J —AI)* for 
any positive integer k, since they have the same eigenvalues and the same 
number of eigenvectors. Thus, to decide the order of each Jordan block in 


J, it is useful to examine the sequence {rank(J—AI)* : k=1,...,my}, as 
follows: 

One can easily show that the power (Jj — AT)! of the Jordan blocks J; 
belonging to A becomes a zero matrix for some k = 1,...,my. In fact, 


rank(J — AT)! decreases as k increases, and stops decreasing at c) = n — my 


326 Chapter 8. Jordan Canonical Forms 


for some k at most m), while all the other blocks belonging to eigenvalues 
different from A remain as upper triangular matrices with nonzero diagonal 
entries, whose ranks will be summed up to c) = n — my. This justifies that 
the sequence {rank(A — AI)* — c) : k= 1,..., m} completely determines 
the orders of the blocks of J belonging to à: 


(i) If k is the smallest positive integer such that rank(A — AI) — c) = 
lo = 0 and rank(A — AI)*-! — ce, = 4 £0, then k is the order of the 
largest blocks belonging to À and £; is the number of such blocks. 


(ii) rank(A — AI)*-? — c) — 24 = & is equal to the number of blocks of 
order k — 1. 


(iii) rank(A — AI)*~3 — c) — 341 — 262 = b is equal to the number of blocks 
of order k — 2, and so on. 


(iv) If &,..., l-1 are determined, then 4; =rank(A — AIET’ — c) — ify — 
(i — 1)é--- — 2¢;_1 is the number of blocks of order k — (i — 1) with 
lo = 0, fori =0,..., 4-1. 


In summary, one can determine the Jordan canonical form J of an n xn 
matrix A by the following procedure: 


Step 1 Find all distinct eigenvalues à, ..., Az of A together with the multi- 
plicities Maj, ..., Map respectively, so that my, H: +m), =n. 


Step 2 For each eigenvalue \;, find the sequence {rank(A — \;I)* — cy, 
k = 1,2,...,m),}. By the above criteria, this sequence completely 
determine the number and the orders of all the Jordan blocks of A 
belonging to the eigenvalue A;. 


This is a general guide for how to determine the Jordan canonical form 
of a matrix. However, for a matrix of large order, the evaluation of (A—AI)* 
might not be simple at all, while, for matrices of lower order or relatively 
simple matrices, the computations may be simple. 


Example 8.3 Find the Jordan canonical form of the matrix 


21 4 
A>) Oo 2 -i 
00 3 


Solution: Clearly, the eigenvalues of A are à = 2 and u = 3 of multiplic- 
ities 2 and 1, respectively. Hence, there are two possibilities of the Jordan 


8.1. Introduction 327 


canonical form of A: 


= (020) æ w=lo20, 


loo 3] i loo 3]. 


But, for cx = 3—2 = 1, rank(A — 2I)? —c, = 0 and rank(A — 27) —c, = 1. 
Thus, there is one Jordan block belonging to A = 2 of order 2, and so the 
Jordan canonical form of A must be J), 


Example 8.4 Determine the Jordan canonical form J of the matrix 


LK O Orme 
oor ce 
ee) 


Solution: The characteristic polynomial of the matrix A is 


det(A — AI) = A4 — 403 + 602 — 44 +1 = (A—-1)*. 


Thus, the eigenvalue of A is A = 1 of multiplicity 4. Now the rank of the 
matrix 


=f 1 00 
0-1 10 
a i 0 0 —i 1 
-1 4 -6 3 


is 3, since the first three columns are linearly independent and they span 
the last column. Hence, the dim NM (A — I) = dim E; = 1, i.e., A has only 
one linearly independent eigenvector so that the Jordan canonical form J 
has only one block: 

a 

0 


1 1 0 

0 1 1 
E E «ah 1" 

0001 


Problem 8.1 Let A be a 5x5 matrix with two distinct eigenvalues A of multiplicity 
3 and u of multiplicity 2. Determine all possible Jordan canonical forms of A up 
to permutations of the Jordan blocks. 


328 Chapter 8. Jordan Canonical Forms 


Problem 8.2 Find the Jordan canonical form for each of the following matrices: 


i 0 AA E 
wh h E e A 

a 00 4 0 0 1 1 

0 0 0 2 


8.2 Generalized eigenvectors 


In the previous section, assuming Theorem 8.1, we have shown how to de- 
termine the Jordan canonical form of a matrix A by analyzing the sequence 
{rank(A — AI)* : k =1,2,...,my}, for each eigenvalue 2 of A. 

In this section, we discuss how to find a transition matrix Q, and provide 
a theoretical basis for the validity of the method. 


Example 8.5 (Transition matriz for a Jordan block) Let A be a 5x5 matrix 
similar to a Jordan block of the form 


GAJE 


ara E o ene ae 
ooo e 
oo O 
DD rR Oo D 
> m O O O 


Determine the transition matrix Q = [X1 X2 X3 X4 X5]. 


Solution: Clearly, A is the only eigenvalue of the two similar matrices A 
and J of multiplicity 5. Since 


01000 
00100 
rank(J — AI) = rank | 0 0 0 1 0 | =4, 
00001 
00000 


dim N (J — AI) = dim FE, = 1. That is, J has only one linearly independent 
eigenvector, which is x; = (1, 0, 0, 0, 0). Thus Qx; = x; is a linearly 
independent eigenvector of A. 

To see what the other columns of Q are, we expand AQ = QJ as 


[Axı Axg Ax3 Ax, Axs] = [AX] X1 +ÀX2 X2 +Ax3 X3 + AX4 X4 + Axs]. 


8.2. Generalized eigenvectors 329 


Thus, we get 
Ax; =x4+Ax5, or (A-—AI)xs = (A-—AI)!x5 = x4 
Ax4=x3+x4, or (A—AI)x, =(A—AI)?x5 = x3 
AX; = X2 + ÀX}, or (A-—AI)x3 = (A-—AI)3x5 = x2 
AX =x,+Ax2, or (A—AI)xy = (A-—AI)*x5 = xy 
Ax, = AX), or (A—AI)x, = (A —AI)°xs = 


This shows that if we find a vector xs such that (A — AJ)°xs = 0 and 
(A — AI)‘xs; = x, Æ 0, then the other x;’s are obtained from x5 via (A — 
A1)'xs = x5_; fori = 1, ..., 5 with x» = 0. Such a vector xs is called 
a generalized eigenvector of rank 5, and the set {x1, ..., x5} is called 
a chain of the generalized eigenvectors belonging to A which is defined in 
the following. Therefore, the columns of the transition matrix Q consist of 
chains of generalized eigenvectors. 


Remark: Note that, in the Example 8.5, if we set Q! = [xs x4 X3 X2 X1] = 
QP where P is an orthogonal permutation matrix, then one can get 


Q'1AQ' = P1(Q7'AQ)P = PJP 
he 0 0 0 o] 
1 4 0 0 0 

= o1 00| =J. 
001A 0 
0001A, 


That is, J and J’ are similar and represent essentially the same Jordan 
canonical form of A. 


In general, by expanding AQ = QJ, one can easily see that the columns 
of Q corresponding to the first columns of the Jordan blocks of J form 
a maximal set of linearly independent eigenvectors of A, and remaining 
columns of Q are generalized eigenvectors. 


Definition 8.1 A nonzero vector x is said to be a generalized eigenvec- 
tor of A of rank k belonging to an eigenvalue A if 


(A—AD*x=0 and (A-A) tx #0. 


330 Chapter 8. Jordan Canonical Forms 


Note that if k = 1, this is the usual definition of an eigenvector. For a 
generalized eigenvector x of rank k > 1 belonging to an eigenvalue A, define 


Xk = xX, 
Xp-1 = (A-XAI)x = (A—AI)xg, 
Xe2 = (A-AIPx = (A-ADxXe-1, 
x. = (A-AI)F-?x = (A—AI)xs, 
x) = (A=-AD* x = (A=ADx. 
Thus, for each £, 1 < £ < k, (A—AI)*x, = (A — AI)*¥x = 0 and (A — 


AS sie = (A= Ary “lx - #0. Hence, the vector x; = (A — AI)*-* is a 
seen eigenvector of A of rank £. 


Definition 8.2 The set of vectors {X1, X2, ..., Xx} is called a chain of 
generalized eigenvectors belonging to the eigenvalue A. 


Note that, x; = (A — AJ)*~!x is always an eigenvector belonging to À, 
called the initial eigenvector of the chain. Sometimes the length k of a 
chain is also called the rank of this initial eigenvector of the chain. Note 
also that (A — AJ)'x; = 0 for £ > i. 

The following series of three theorems shows that a transition matrix 
Q may be constructed from the chains of linearly independent generalized 
eigenvectors of A, and justifies the invertibility of Q. 


Theorem 8.2 A chain of generalized eigenvectors S = {x,, X2, ... ; Xk} 
belonging to an eigenvalue X ts linearly independent. 


Proof: Let us solve c1X1 +C2X2 +- --+CkXk = 0 for scalars c;, i = 1, ks 
If we multiply (on the left) both sides of this equation by (A — Nile i sahe 
for i = 1, ..., k—1, 


(A—AD*-lx; = (A — ADET CD(A — AT ix; = 0. 


Thus, c,(A — AZ)*-!x, = 0, and, hence, cp = 0. 

Do the same to the equation cyx; +--++cp—1Xp—1 = 0 with (A—AI)*-? 
and get ck-1 = 0. Proceeding successively, we can show that c; = 0 for all 
i = 1, ..., k. That is, the equation has only the trivial solution. Hence, 
the set S is linearly independent. 


Theorem 8.3 The union of chains of generalized eigenvectors of a square 
matriz A belonging to distinct eigenvalues is linearly independent. 


8.2. Generalized eigenvectors 331 


Proof: Let {x1, X2, ..., xx} and {y1, Y2, ..., ye} be the chains of gen- 
eralized eigenvectors of A belonging to the eigenvalues À and u, respectively, 
and let A 4 u. We wish to show that the set of vectors {X1, ..., Xk, Yi, -55 


yz} is linearly independent. To solve the linear dependence of them, 


X1 +: + gx, + diyi +--+ + deye = 0, 


for c;’s and d;’s, we multiply both sides of the equation by (A — AT)* and 
note that (A — AJ)*x; = 0 for alli = 1,..., k. Thus we have 


(A—AD)* (diy + doye +--+ + deye) = 0. 
Again, multiply this equation by (A — pJ)*~! and note that 
(A-a HA -ADE = (AAD A= pe, 


(A= pl)‘ lye yı; 
(A-a) yi = 0 


for i = 1,...,4— 1. Thus we obtain 
0 = dq(A — Al)* yy. 

Because (A — uI)yı = 0 (or Ayı = uy1), this reduces to 
de(u — d)"y; = 0, 


which implies that dọ = 0 by the assumption à Æ u and yı 4 0. Proceeding 
successively, we can show that d; = 0, i = £, 4—1, ..., 2, 1, so we are left 
with 

C1X1 +--+ +CkXk = 0. 


Since {X1, ..., Xk} is already linearly independent by Theorem 8.2, c; = 0 
for alli = 1,...,k. Thus the set of generalized eigenvectors {X1, ... 
Yı; ---, Ye} is linearly independent. 


> Xk; 


The next step to produce Q such that AQ = QJ is to describe a method 
for choosing chains of generalized eigenvectors from a generalized eigenspace, 
which is defined below, so that the union of the chains is linearly indepen- 
dent. 


Definition 8.3 Let à be an eigenvalue of A. The generalized eigenspace 
of A belonging to A, denoted by Ky, is the set 


Ky ={xeEC"’ : (A—AI)’x =0 for some positive integer p}. 


332 Chapter 8. Jordan Canonical Forms 


It turns out that dim K, is the multiplicity of A, and it contains the 
usual eigenspace N(A — AI). The following theorem enables us to choose 
a basis for K}, but we omit the proof even though it can be proved by 
induction on the number of vectors in SUT. 


Theorem 8.4 Let S = {x,, X2, ..., xx} and T = {y1, yo, .--, ye} be 
two chains of generalized eigenvectors of A belonging to the same eigenvalue 
Aà. If the initial vectors x, and yı are linearly independent, then the union 
SUT is linearly independent. 


Note that this theorem easily extends to a finite number of chains of 
generalized eigenvectors of A belonging to an eigenvalue A, and the union of 
such chains will form a basis for K so that the matrix Q may be constructed 
from these bases for each eigenvalue as usual. 

However, finding chains of generalized eigenvectors are not simple in 
general. One may try to find them in two ways as follows: 

Method 1: (1) For each eigenvalue à of multiplicity m), first find a 
maximal set of linearly independent eigenvectors x,y,...,zZ which form a 
basis for the eigenspace E, and then solve 


(A — AI)xg = ax + by +--+ + cz = x) 


for suitable a, b, ..., c that make the systems (including those systems in 
step (2) below) consistent. 

(2) Inductively, solve (A — AT)x; = x;_-1 for i = 2, ..., until the equation 
becomes inconsistent. When (A — AI)x,41 = Xz becomes inconsistent, k is 
the rank of the generalized eigenvector xx. 

(3) If k < my, with the same method in step (1) above find another 
suitable vector yı in Ey which is linearly independent from xı, and then 
repeat the above process. 

(4) Do this until the ranks sum up to m). 

(5) An inconvenience of this method is that one does not know what the 
initial eigenvectors of the chains and their ranks are at the outset. 


Example 8.6 The characteristic polynomial of 


eee) 
A=] 8 -5 l, 
=A S S 


8.2. Generalized eigenvectors 333 


is det(AI — A) = A3 — 32 + 3A — 1 = (A— 1), and the eigenvalue of A is 
A = 1 of multiplicity 3. Its eigenvectors are the solutions of 


4 —3 -2 z 0 
(A-I)x= 8 —6 —4 y | =] 0 
—4 3 2 z 0 


Since the three equations are identical, one gets two linearly independent 
eigenvectors u, = (1,0,2) and ug = (0, 2,—3). A generalized eigenvector is 
the solution of (A — I)x = ciu + cpU2 = x, for some constants c;’s. In 
fact, the system 


4 —3 -2 z 1 0 C1 
8 -6 —4 Y =c, | 0 | +e 2 = 209 
—4 3 2 Z 2 —3 2c1 = 309 
has a solution if and only if cı = c2, and a solution is x2 = (0,0,—1) 


for xı = (2,4,—2). Since (A — J)x3 = xə is inconsistent, the rank of the 
generalized eigenvector Xə is 2 so that {x1, x2} isa chain. Since u; = (1,0, 2) 
is linearly independent to xı, we may take x3 = u, and so 


2 0 1 
Q = [x1 X2 X3] = 4 0 0 
—2 —-1 2 


1 1 0 
is a transition matrix so that Q~!AQ = J = | 0 1 0 | ‘ 
| 0 0 1 | 


Method 2: (1) Determine the Jordan canonical form together with the 
order of each block as explained in the previous section. 

(2) For each eigenvalue » and each block of order k belonging to A, find 
the solution (A — AJ)*x = 0, but (A — AI)*~!x # 0, and then construct a 
chain of this generalized vector. 

(3) An inconvenience of this method is that one has to compute (A—AI)* 
for each eigenvalue A and k, which is not simple if the order of A is large. 


Example 8.7 For the same matrix A as in Example 8.6, we know that A 
has only two Jordan blocks for only one eigenvalue 1. By direct computation 
we find that A — I Æ 0 and (A — I)? = 0. Thus, we can take any vector 
x) such that (A — I)x2 # 0, which becomes a generalized eigenvector of 
rank 2. Take xg = (1,1,1), then x; = (A — I)x2 = (—1,—2,1). Now find 


334 Chapter 8. Jordan Canonical Forms 


another eigenvector of A belonging to A = 1 by solving (A — J)x3 = 0, and 


ol ee Ene 
get x3 = (1,0,2). Then for Q = [xı xg x3] = | —2 1 0 |, we get 
1 1 2 


1 1 0 
Q-'AQ=]0 1 0 

0 0 1 
Example 8.8 For a matrix 


2 1 
A=ļ|0 2 a], 
00 3 


find a transition matrix Q so that QT!AQ is the Jordan canonical matrix. 


Solution: Write Q = [x X2 X3], and we try to find the columns x;, i = 1, 
2,3. 
Method 1: In Example 8.3, the matrix J of A is determined as 


J= 


O ON 


1 
2 
0 


w o o 


From AQ = QJ, by comparing the column vectors of both sides, one can 
get 
Ax) = 2X1, Ax» = 2x9 + X], Ax3 = 3X3. 


Thus x; and x3 are the eigenvectors of A belonging to A = 2 and A = 3, 
respectively. By a direct computation, they are found to be x; = (1, 0, 0) 
and x3 = (3, —1, 1). By solving the equation (A — 2/)x) = xı, one gets 
x2 = (a, 1, 0) with any constant a, so that 


1 
Q=| 0 
0 


O S 
| 


It is not hard to see that x,, x2, x3 are linearly independent, so that 
Q"AQ=d: 

Method 2: From Example 8.3, the Jordan block belonging to 2 is of order 
2. Thus we need to find a generalized eigenvector x2 of rank 2 belonging to 


8.2. Generalized eigenvectors 335 


the eigenvalue 2, which is a solution of the following systems: 


0 1 
(A-2I)x = |0 0 —i |x 40, 

00 1 

0 0 
(A-2I1?x = |0 0 -1|/x =0 

00 1 


From the second equation, x2 has to be of the form (a, b, 0), and from 
the first equation we must have b # 0. Let us take x2 = (0, 1, 0) for a 
generalized eigenvector of rank 2. Thus we have 


(A — 2I)x2 = X= (1, 0, 0), 
(A—21)*x. = (A-2I)x; = 0. 


Thus x; = (1, 0, 0) is the only one linearly independent eigenvector be- 
longing to 2. An eigenvector belonging to A3 = 3 is easily found to be 
x3 = (3, —1, 1). Clearly, the set of vectors {x1, x2, x3} is linearly inde- 
pendent, and so 


10 3 1 0 — 
Q=|0 1 Ss |e s QŤ=|0 1 1 
0 0 00 1 
Then 
210 
Q'AQ=]0 2 0 aa rar 
00 3 
2 1 
where Ja = | § y | and 2 = Bh 


Example 8.9 Find Q so that QT!AQ = J is the Jordan canonical form of 
the matrix 

0 0 | 
1 0 


0 1? 
—6 4 


e O Orme 


336 Chapter 8. Jordan Canonical Forms 


Solution: Method 1: In Example 8.4, we have found that 


SERI 


R a. ankle 
0001 


Write a transition matrix as Q = [x1 X2 x3 X4]. From the expansion of 
AQ = QJ, one gets: 


Ax; = X], Axg = Xə + X], Ax3 = X3 + X2, ÅX4 = X4 + X3. 


There is only one linearly independent eigenvector belonging to A = 1, which 
is xı = (1, 1, 1, 1). Now a solution of (A — J)x2 = xy is X2 = (a, a+ 1, a+ 
2, a+ 3) for any a. Take xg = (0, 1, 2, 3). Inductively, the solutions of 
(A — I)x3 = Xə and (A — I)x, = x3 are x3 = (b, b, 6+1, b+ 3) for any b 
and set x3 = (0, 0, 1, 3), and successively one can take x, = (0, 0, 0, 1). 
Clearly, they are linearly independent and so 


1 0 0 0 
aE to | 
13 3 1 


One may check Q-!AQ = J by a direct matrix multiplication. 

Method 2: As we saw in Example 8.4, A has only one Jordan block. 
Therefore, one has to find a generalized eigenvector of rank 4, which is a 
solution x of the following equations: 


(A—I)‘x = 0 


-1 3 —3 1 
-1 3 -3 1 
Dx = 
(A-IPx ap eo ai x Æ 0. 
-1 3 -3 1 


But, a direct computation shows that the matrix (A — I)* = 0. Hence, 
one can take any vector that satisfies the second equation as a generalized 
eigenvector of rank 4: Take x, = (—1, 0, 0, 0), and then 


ic ael 1 
(et, 250 0 0 

ee AE Rear a art os) = eo 
=f. 1463 0 1 


X2 = (A -— I)x3 = (—1, 0, 1, 2), 
xX, = (A —I)x2 (1, 1, 1, 1). 


8.3. Applications of the Jordan canonical forms 337 


Therefore, these vectors {x1, X2, x3, x4} form the chain of linearly inde- 
pendent generalized eigenvectors. Therefore, 


1 -1 1 -1 0 1 00 
ļ|ı 00 0 a | 0-1 10 
Gea ao ec 1-2 1 
1 21 0 -1 3 -3 1 
and 
1 1l 007 
ps V O e 
E gr 
0001 


Problem 8.3 Find a full set of generalized eigenvectors of the following matrices: 


-2 0 -2 26. 31 Si 
JESEN! JE 6 a|. 
0 2 1 


8.3 Applications of the Jordan canonical forms 


The Jordan canonical form of any square matrix A enables us to compute the 
power A* and the exponential matrix e^, and to solve many other problems 
related to the matrix A. Let J be the Jordan canonical form of an arbitrary 
n x n square matrix A such that 

a 


QAQ =J= ani e] 
| 7a] 


where Q is made of generalized eigenvectors of A and J;’s are Jordan blocks. 


8.3.1 Computation of the powers A* of A 


o a 


EEEE ee ee 


Since we have 


338 Chapter 8. Jordan Canonical Forms 


ee ee ee ee 


for k =1, 2, ..., it is good enough to compute J* for each Jordan block J. 
Now an m x m Jordan block J belonging to an eigenvalue A of A may be 
written as 

A 1 0 1 0 0 0 1 
J= i -A 0 . " 0 

A l 1 0 0 
0 0 A 0 0 1 0 0 
= AI+N. 


Since J is the identity matrix, clearly IN = NI and 
Ei a ane 
PALEN =S ( Jaini, 
J 


j-0 


But N* = 0 for k > m. Thus, by assuming ($) =0ifk< Z, 


m-—l1 
+= F 


j=0 
= AFI + (TTN tet ( = i) \e-(m—-1) ym 
[a T AEE pia (ape) 
0 xe (ARE) peo 
xe (iat! 
0 0 AK 
Example 8.10 Compute A*, k = 1, 2, ..., for 
1 1 1 1 
022 0 
A 00 2 0 
ai 1 0 3 


Solution: The characteristic polynomial of A is det(AJ — A) = A* — 8A + 
24? — 32A + 16 = (A- ae , and A = 2 is an eigenvalue of multiplicity 4. By 


8.3. Applications of the Jordan canonical forms 339 


a direct computation, one can see that rank(A —2/) = 2, rank(A—2/)? = 1 
and rank(A — 2J)? = 0. Thus, the Jordan canonical form J of A must be of 
the form 


oow 
O N H= 
wor © 
| ee en) 


0 0 0 


Also one can easily find a transition matrix Q to be 


1 10 0 2 —1 0 -i 
1020 1 a _ Į|-1 10 1 
Q= 001 017 and then Q` = 01 oO]? 

1 0 0 -1 on Yak, iQ ue 


where 
2107% 20 0 0 107\* 
Oo. is = 020/4+/00 1 
002 002 00 0 
Æ 0 0 010 001 
= | We BP oto o EE Ok 0 
0 0 2h 00 0 00 0 
[2 att Gat 
= jo `% z=]. 
lo 0 ge | 
Hence, 


340 Chapter 8. Jordan Canonical Forms 


| gk _ ķk2k-1 pok—-1 BRU gk-2 4 Rok 1 kok-1 | 


a 0 2 Dh! 0 
E 0 0 ok 0 
k-1 k-1 k(k-1) yk—-2 k-1 k 
—k2 k2 “2 k2 +2 
Problem 8.4 Compute A*, k = 1, 2, ..., for 
2 1 0 0 0 -83 1 2 
0 2 10 —2 1 1 2 
(D2 0 0 2 0p? VB —2 1 -1 2 
0 00 1 —2 —3 1 4 


8.3.2 Computation of the exponential matrix e^ 
Note that 
e^ = eQ7Q7! _ GeO 
ew 0 
| e?? | 
zi 
= Q QU; 
0 els 


where J;’s are the Jordan blocks. Thus, it is enough to compute e” for a 
simple Jordan block J. Let 


J=AI+N, 
where I and N are as in Section 8.3.1. Then, N* = 0 for k > n, and 
1 1 
le o> 233 eh a | 
| 2! (n— 1)! 
1 
0 1 1 — 
n-1 k = 1 
N n — 2)! 
EEDE DIE ET l (n — 2) 
k=0 
1 
0 1 
Example 8.11 Compute e^ for 
1 1 1 1 | 
02 2 0 
E 002 0]° 
-1 10 3 


8.3. Applications of the Jordan canonical forms 341 


Solution: With the same notations as in Example 8.10, 


Jı 
—1 = e 0 = 
e^ e279 QIQ l Q | (0) J2 | Q L 


e 
where 
E 1 Pal [9 1 val 
A= |0 2 1), =27+)0 014) =274+N and J= [2]. 
| 0 0 2 | | 0 0 o | 
Hence, 
1 
21 N E Nf E i a 
e =e” =e Doe 1 1 ing 
om 00 1 
It gives that 
LT: © 0 e Be? e2 
0110 0 e 2? 0 
A — OeJO-1 — -2 -1 
eerie, ee eps = ° . s 
0001 =e" ef  5e° 2e 


8.3.3 Linear differential equations IT 


Now, we go back to a system of linear differential equations 
y'= Ay with an initial condition y(0) = yo. 


Its solution is known as y(t) = e'4yo. (See Theorem 6.26). In particular, if 
A is diagonalizable, this solution can be written as 


tA 
y(t) = e yo = ce vy + geo + + ene Vn, 
where v1, V2, ... ,V, are the eigenvectors belonging to the eigenvalues 


d;’s of A. For an arbitrary square matrix A (not necessarily diagonalizable), 
let Q-'AQ = J be the Jordan canonical form of A. Then, the solution 


342 Chapter 8. Jordan Canonical Forms 


tA tJn-1 
e yo = Qe Q yo 
én o 0 
; ba 
t 
0 err? 0 C2 
= [u u2 un] ; | 
c 
0 elds n 
eòitetNı 0 Lik 0 | 
Cl 
0 eò2tetN2 gua 0 c2 
= [uy U2 + un] . : 
0 erst etNs Cn 
where Q7'yo = (cj, -.., Cn) and the u,’s are generalized eigenvectors of 


A. In particular, if J is a Jordan block with corresponding generalized 
eigenvectors u; of order k, then the solution becomes: 


eyo i e Qe Qyo 
2 n—-1 
[aes ae Z 
2! (n— 1)! 
Ok g E Hen | 
— 2)! C2 
— et [uy ug up| in ) . 
1 : 
Cn 
t 
0 1 


n—-1 tk n—2 tk 
= eò (x: anp) uy + E cw) Ug+ es) + cat) i 
k=0 k=0 


Example 8.12 Solve the linear differential equation y’ = Ay with initial 
condition y(0) = yo, where 


8.3. Applications of the Jordan canonical forms 343 


Solution: (1) The characteristic polynomial of A is det(AT—A) = A2—7A?+ 
16A—12 = (A — 3) (A — 2)? . Since xı = (—1, —1, 1) and x; = (2, 1, —1) are 
linearly independent eigenvectors belonging to ÀA = 2 and A = 3, respectively, 
one can compute the Jordan canonical form of A as follows: 


eee ; TEE ik 


[o 0 3 | pgs 
where 
2 1 —1 1 2 
el a J = [3], and Q= | -1 1 1 
1 0 -l 


(2) Let y = Qx. Then the given system changes to x’ = Jx with 


eal E a Paka 


Lr -ao}la} Lal’ 


and the solution of this new system is given by 


oom ot) a[4 S818 
| 1 


ts ie a] e” AE 
me "To allo oe] |i] 


Example 8.13 Solve a system of linear differential equations y'(t) = Ay (t), 
where 
5 —3 -2 
A= 8 —5 —4 
—4 3 3 


344 Chapter 8. Jordan Canonical Forms 


Solution: In Example 8.6 or 8.7, we have found a transition matrix Q = 


2 01 
[Xi X2 X3 = 4 0 0 | so that 
SD er 20 
110 1 1 
QAQ =J=] 0 1 0]= Fea aly | 
001 [1] $ 


Thus, the solution y(t) = eyo is 


1 ¢t 0 
eM4yo = T ge | Tv. = ‘ale : I 
aF o i An 4 STEE 
“fa aallo salli golla] 
4tet t —3tet —Itet 
PS [fF] 


Or, if we set Q7!yo = (di, d2, d3), then 


d 
y(t) = e'[x, tx, +x2 xə] | i | 


e'((d, + dot)x, + dox2 + d3x3). 


Problem 8.5 Solve the system of linear differential equations y’ = Ay with the 
initial condition y(0) = yo, where 


ck el l 


8.3.4 Linear difference equations II 


Let A be the companion matrix of order k of a linear difference equation 
Xn = AX,_-1, and let Xo be an eigenvalue of A with multiplicity m > 1. 
Then, by Lemma 6.12, Ap has only one linearly independent eigenvector v 
of the form Dat --» ào 1)", which means that the size of the Jordan block 


8.3. Applications of the Jordan canonical forms 345 


corresponding to this eigenvector is the multiplicity m of Ao. Therefore, if 


A has s distinct eigenvalues each of multiplicity mj, 7 = 1, 2, ..., s so that 
my+mg+ ++. +m, =k, then the Jordan Canonical form of A has s Jordan 
blocks each of order m;: i.e., 
Ji ge 1l > 0 
QAQ =J= e , where Jj = ti 1 
Js Aj 
The chain of generalized eigenvectors containing vj; = [Ago ee Àj te 
consists of mj vectors: vjj;,7= 1, 2,..., mj, which, from AQ = QJ, satisfy 
(A = Ajl)vji = 0, (A = AjL)vj2 = Vjl;, +; (A E AjI) Vim; = Vi(mj—-1)° 


Beginning with vj; = [Art e Aj 1)", one can easily compute inductively a 
chain of generalized eigenvectors: i.e., if we denote a generalized eigenvector 


by v = [p_-1 «++ 21 zo|", then 


for Vj, Ln = AF = W) A, for n > 0, 

for vj2, En = nA”?! =(] a for n>1, 
=1) \n—-2 -2 

for vj3, £n = met) yn = (3) rN ; for n> 2, 


for Vjm;; Zn = À; , for n> mj-l1. 


Thus the columns of the transition matrix Q corresponding to J; looks like: 


= — k-1)- = k—1)— = k-1)-—(m;-1 
x 1 (ape. 1)-1 Gray Doe dal (a); )—(mj-1) 
ee (Ae 
: : i 0 
A? Àj 1 
Àj 1 0 
1 0 0 e 0 


Notice that the coefficients form the Pascal triangle. Therefore, the general 
solution of a linear difference equation x, = Ax,_, is of the form, 


s 
Tna = S COEN + "1+ + Cjm; Ee 4 i 
j=l 


for n > 0. Here we assume that (2) =0ifn< nm. 


346 Chapter 8. Jordan Canonical Forms 


8.3.5 Discrete dynamical systems II 


In Section 6.3.2, the discrete dynamical systems have been introduced, and 
it is shown that 1 is always an eigenvalue of a stochastic matrix. We now 
show that the magnitudes of all the eigenvalues of a stochastic matrix are 
bounded by 1, and moreover, if a stochastic matrix has positive entries, then 
its eigenvalues A are all either |A| < 1 or A = 1, and so no one is such that 
[A] = 1 butà #1. 

For this, we first try to estimate the bound for the magnitude of the 
eigenvalues of any square matrix A (including complex matrix) in terms of 
the absolute values of the entries of A. For any square matrix A = [aij] of 
order k, let 


k 

RA) = max{Rj(A) = 5 lay) : 1 <i < k}, 
j=l 
k 

C(A) = max{C;(A) =X lay] : 1<j <k}, 
i=1 


Si = R,(A) — |aiil. 


Theorem 8.5 (Gerschgorin’s Theorem) For any square matriz A of or- 
der k, every eigenvalue A of A satisfies |A — ag| < s¢ for some 1< £< k. 


Proof: Let À be an eigenvalue with eigenvector x = [x1 £2 -+-+ xp)’. Then 
Ee ayjxj = Azi, fori = 1, ..., k. Take a coordinate x, of x with the 
largest absolute value. Then clearly z; 4 0, and 


|A — aee||e| = [Aze — auzel) = |X aya; < X laejlleel = selze. 
jz jz 


Since |x| > 0, |A — agg| < se. 


Corollary 8.6 For any square matrix A of order k, every eigenvalue A of 
A satisfies |A| < min{R(A),C(A)}. 


Proof: Note that |A| < |A — ag| + law] < se + |au| = Re(A) < R(A). 
Moreover, since À is also an eigenvalue of A’, |A| < R(A’) = C(A). 


Furthermore, if A is a stochastic matrix, then C(A) = 1. 


8.3. Applications of the Jordan canonical forms 347 


Corollary 8.7 If X is an eigenvalue of a stochastic matrix A, then |A| < 1. 


We next give a quick explanation of the key properties of a positive 
matrix. 


Theorem 8.8 (Perron-Frobenius Theorem) Let A be a matrix with pos- 
itive entries. If Xo is an eigenvalue of A such that |Xo| is the largest, then 
ào is real and positive, and so are the components of its eigenvector x. 


Proof: Let A and z be such that Az = Az. Then |A||z| = |Az| < Alz], 
where x = |z| is a nonnegative vector. Thus the following number exists: 


Ag = max{ t E€ R | Ax > tx, for some x >0, x £0}. 


Then clearly, A9 > |A| > 0. We claim that Ax = ox, which shows that 
Ao is the largest eigenvalue with a nonnegative eigenvector: Indeed, suppose 
not: Ax > ox, i.e., some components may be in equalities and some are in 
strict inequalities. Then, since A is positive, one can easily see that we have 
strict inequality: 

A’x > \j Ax, 


Ay > roy, y = Ax, 


which contradicts the maximality of Ao. 


Note that a stochastic matrix A has 1 as an eigenvalue by (1) of Theo- 
rem 6.18. In the following, we will further show that, if a stochastic matrix 
A has all positive entries, then no eigenvalue of A other than A = 1 satisfies 
|A| = 1. Thus 1 is the largest eigenvalue with a nonnegative eigenvector. 


Theorem 8.9 Let A be a matrix with positive entries. If X is an eigenvalue 
of A such that |A| = R(A), then A = R(A) and dim Ey = 1 with a basis 
u=[11 -1]f. 


Proof: Let x = [x £2 --- xk]! be an eigenvector belonging to \. Take a 
coordinate x¢ of x with the largest absolute value. Then 


k k 
JAllze| = [Aze] = Saja; < J lags ||x5| 
jal j=l 
k 
< So layllze| = Re(A) lel 
j=l 
< R(A)|zel. 


348 Chapter 8. Jordan Canonical Forms 


Since |A| = R(A), the three inequalities are actually equalities, 7.e., 
k 

So lags 

j=l 


k k 
X lags llars| X Jags |acel 
j=l j=l 


R(A) = RA) 


k 
$ ej 
jal 


It is an easy exercise (see Problem 8.6 below) to show that the first equality 
means that all terms ag;x7;, j = 1, ..., k, are nonnegative multiple of some 
complex number z with |z| = 1. Thus agjaj = cjz, or £j = a for some 
nonnegative real numbers c1, ..., Ck. The second equality means that ay; = 0 
or |x;| = |z| for all j = 1,..., k. Since ag; > 0 by assumption, we have the 
second possibility: |a;| = |x| = M for all j = 1, ..., k. This means that, 
for each j = 1, ..., k, the above computation is valid: i.e., R;(A) = R(A) 
7 ay or zj = Mz for all j = 1, 
..-, k, and sox = [x --- z]! = Mz[1 1 --+-1]7. That is, u = [1 1 ---1]" 
is a basis for Ey, so that dim E, = 1. Moreover, 


ee ee eal 
[i] lst so | cay | | r | in 


or Ax = R(A)x. Thus if x is an eigenvector belonging to A with |A| = R(A), 
then it is an eigenvector belonging also to R(A) so that A = R(A). 


for all j. Thus one gets M = |z;| 


Problem 8.6 (1) Prove that |u +v] < |u| + |v], for any complex numbers u and 
v, and the equality holds if and only if u = az and v = bz where a, b are 
nonnegative real numbers and z is a complex number such that |z| = 1. 

(2) Use induction to show | Ma zil < yes |z;|, and equality holds if and only 
if z; = ciz for some nonnegative real numbers c; and a complex number z 
such that |z| = 1. 


Note that |A| = C(A) also implies à = C(A) since C(A) = R(A‘). If A 
is a stochastic matrix, then C(A) = 1, and so we have the following. 


Corollary 8.10 Let A be a stochastic matriz with positive entries, and let 
A be an eigenvalue of A other then 1. Then |A| < 1. Moreover, dim E; = 1. 


8.3. Applications of the Jordan canonical forms 349 


Problem 8.7 Prove that the dimensions of the eigenspaces of a common eigenvalue 
à of A and A” are equal. 


Therefore, if A is diagonalizable stochastic matrix with positive entries, 
then lim A” = L always exists by Theorem 6.16 and Corollary 8.10. For a 


N— Co 
general (nondiaginalizable) matrix, we have the following criteria. 


Theorem 8.11 Let A be any square matriz. Then lim A” exists if and 
only if the following conditions hold: ie 

(1) An eigenvalue à of A satisfies |A| < 1, or \=1. 

(2) If 1 is an eigenvalue of A, then dim E; is the multiplicity of 1. 


Proof: Let J be the Jordan canonical form of A. Then jim AÝ exists if 
+00 


and only if lim J* exists. Let J; be a Jordan block of order m in J of the 


k- 00 
form as (1) in page 338. Then, for 1 <i < m, jim (xe =0 if |à| < 1, 
FOG 
and lim (Haji = œ if |A| > 1. It follows that lim J* exists if and only 
k= k—oo 


if either |A| < 1 or à = 1 and m = 1. In the former case, jim Je = 0, and 
—0o 


in the latter case, jim J? = [1] and so each Jordan block of à = 1 isa 1 x1 
00 


matrix, which means dim Æ; = multiplicity of 1. 


In general, it may happen that dim Æ; is less then the multiplicity of 
the eigenvalue A = 1, as the following example shows. In fact, we have seen 


in Remark on page 233 that the matrix A = : is not diagonalizable 
since 1 is an eigenvalue of multiplicity 2, while dim#&, = 1. Thus the 
second condition of Theorem 8.11 does not holds, and one can easily see 
that lim A” does not exist since A” = a ; 


The following theorem summarizes what have been discussed so far about 
a stochastic matrix. 


Theorem 8.12 Let A be a stochastic matrix with positive entries. Then 
(1) The multiplicity of the eigenvalue A = 1 of A is 1. 
(2) Bus A! = L exists (that is, the diagonalizability of A is not necessary 
for this), and L is also a stochastic matriz. 


(3) AL=LA=L. 


350 Chapter 8. Jordan Canonical Forms 


(4) The columns of L are identical. In fact, each column of L is equal 
to the unique eigenvector u of A belonging to A = 1 which is also a 
probability vector. 


(5) For any probability vector x, lim A’x =u. 
k->00 


Proof: (1) By the proof of Theorem 8.11, each Jordan block of 1 is 1 x 1, 
while dim Æ; = 1 by Corollary 8.10: i.e., 1 has only one Jordan block. 

(2) The first part follows directly from part (1), Corollary 8.10, and 
Theorem 8.11. For the second part, since A* are stochastic matrices, each 
entry of A* is nonnegative for k = 1, 2,.... Thus [L]; = jim [A* ij > 0 for 
0<i,7 <n. Moreover, for 1 <j <n, 


n 


> Ala 
i=1 i= i=l 
(3) Trivial, since A( lim A*) = (lim A*)A = lim A**! = L. 
k= k-00 k= 
(4) Since AL = L, each column of L is an eigenvector of A belonging to 
à= 1. But dim £; = 1 means they columns of L are the same eigenvector 
belonging to A = 1, and each of them is a probability vector by part (2). 
(5) The vector y = Lx is also a probability vector by (2) and Lemma 6.17, 


and Ay = ALx = Lx = y. That is, y is also an eigenvector of A belonging 
to à = 1, and s50 y = u. 


= lim 1=1. 
k= o0 


Note that (5) in Theorem 8.12 means that the eventual distribution of 
the objects depends only on the sum of all the components of the probability 
vector, but not the initial distribution. 

The vector u in Theorem 8.12 is called the stationary vector or the 
steady state of A. 


Remark: Actually, in Theorem 8.12, the positiveness of the entries of a 

stochastic matrix may be weakened: That is, Theorem 8.12 still holds if some 

power of the stochastic matrix A has only positive entries. For example, for 
| 08 0 0.6 | 

the matrix A = 0 0.3 0.4 |, A? has positive entries. To prove this 
[o2 o7 0 | 

assertion, it is good enough to reprove Corollary 8.10. 

In fact, suppose that there is d > 1 such that A? has only positive entries. 
Then it is clear that the entries of At! = (A%)A are all positive since A is 
a stochastic matrix. If A is an eigenvalue of A such that |A| = 1, then à% 
and Aft! are eigenvalues of A“ and A@*!, respectively, with absolute value 


8.4. Cayley-Hamilton theorem 351 


1. Thus, by Corollary 8.10, A? = Aft! = 1, which means \ = 1. Note that 
E,(A) C E,(A®%), and dim E (Af) = 1, which means E,(A) = E, (Af) and 
dim F (A) =1. 


8.4 Cayley-Hamilton theorem 


As we saw in earlier chapters, the association of the characteristic polynomial 
with each matrix is very useful in studying matrices. In this section, using 
this association of the polynomials with matrices we prove one more useful 
theorem, called the Cayley-Hamilton theorem, which makes the calculation 
of matrix polynomials simple, and has many applications to real problems. 

Let f(x) = @m2"™ + Qm—12"™ |! +-+++a 1" +a be a polynomial, and let 
A be an n x n square matrix. The matrix defined by 


f(A) = amA” + am-1 47! +++-+a,A+ aoln 


is called a matrix polynomial of A. For example, if f(x) = x? — 27 +2 


1 2 
and A=| 5 1 |» then 


f(A) = A= 9A O75 


oe a}+7{os|=[o 5]. 


Problem 8.8 Let à be an eigenvalue of A and x an eigenvector belonging to A. If 
f(x) is any polynomial, then f(A) is an eigenvalue of the matrix polynomial f(A). 


Theorem 8.13 (Cayley-Hamilton) For any n x n matriz A, if f(A) = 
det(AI — A) is the characteristic polynomial of A, then f(A) =0. 


Proof: By Theorem 8.1, any square matrix A is similar to the Jordan 
Ji 0 


canonical form J = Ses E = Q7-!AQ, or A = QJQzu!. Then 
lo a 
clearly f(A) = Qf(J)Q71, and 


| A 0 | f(A) 0 | 


a na eee a al 


352 Chapter 8. Jordan Canonical Forms 


From Exercise 4.21, the characteristic polynomial f; of each Jordan block 
J; is a factor of that of A: i.e., 
det(AJ — A) = det(AT— J) = det(AI — Jı) det(AT — J2) ---det(Al — Js) 
or f(A) FA) +++ F(A). 

Thus, it is good enough to show that f(J) = 0 for a single Jordan block 
J = al + N with eigenvalue a of multiplicity m, in which N™ = 0. Since 
f(A) = det (AI — A) = det (AI — J) = (àA — a)”, 

f(J) = (J -—al)” = (al + N-al)" = N” = 0. 
Example 8.14 The characteristic polynomial of 


3 6 6 
A= 0 2 0 
—3 —12 —6 


is f(A) = det(AI — A) = 4° + à? — 6), and 
f(A) = A®+A?-6A 


27 78 54 —9 —42 —18 
= 0 8 QO | + 0 4 0 
—27 —102 —54 9 30 18 

3 6 6 000 

—6 0 2 O0;=]0 0 0 

—3 —12 —6 0 0 0 


Remark: It is interesting that the last part of the proof of the Cayley- 
Hamilton Theorem 8.13 can be replaced by a direct computation of f(J) for 
a Jordan block J of order m as follows: Recall that 


m—1 
JE `> e AETI NI 
j 


j=0 
k S k vee 7 
= str (i) at Itt ( )x (m—1) ym-1 
1 m—1 
DE RE NESE eee, a ern 
0 KF Peta mame GE 
aE (ae 


8.4. Cayley-Hamilton theorem 303 


Therefore, for any polynomial f(A) = b,A° + TEES Gg een hc 
f(D) = bs Jê +d! +--+ bJ + bo 
(m=1) 
fa) fay So 
fom) (A) 
= 0 f(A) nte m2 
0O > 0 FA) 


where f(A) is the i-th derivative of f(A). In particular, if f(A) is the 
characteristic polynomial of A, and a is an eigenvalue of multiplicity m, 
then f(A) = (A — a)" and so f(a) = 0 for i = 0,1,...,m—1. Thus 
f(J) =0. 


8.4.1 Application to the inverse matrices 


The Cayley-Hamilton theorem can be used to find the inverse of a nonsin- 
gular matrix. If f(A) = A”? + an_1 A"! + - - -+ a1 à + ao is the characteristic 
polynomial of a matrix A, then 


0 = f(A) A” +an-1 4"! ea A eas, 
or — aol = (ATSA Jeo A e pei a,1)A. 


Since a9 = f(0) = det(OJ — A) = det(—A) = (—1)" det A, A is nonsingular 
if and only if ag = (—1)" det A Æ 0. Therefore, if A is nonsingular, 


1 
AT! = aA ae eee Farr aT). 
0 


Example 8.15 The characteristic polynomial of the matrix 
| 4 2 -2 | 
A=j;-5 3 2 


| 2 4 1 | 


is f(A) = det(A — A) = A3 — 8A? + 17A — 10, and the Cayley-Hamilton 
theorem yields 
A? — 8A? + 17A = 105. 


354 Chapter 8. Jordan Canonical Forms 


Hence 
1 
Avy 3 79 (4° — 8A + 173) 
1 10 6 —6 4 2 -2 17 1 0 0 
Sy | ede E S Be a 00. od 
—30 12 13 —2 4 1 1 0 1 
1 —5 —10 10 
= T 1 0 2 
—14 —20 22 


Problem 8.9 Let A and B be square matrices, not necessarily of the same size, 
and let f(A) = det(AI — A) be the characteristic polynomial of A. Show that f (B) 
is invertible if and only if A has no eigenvalue in common with B. 


8.4.2 Application to matrix polynomials 


The Cayley-Hamilton theorem can also be used to simplify the calculation 
of matrix polynomials. Let p(A) be any polynomial and let f(A) be the 
characteristic polynomial of an n x n matrix A. By the Euclidean division 
algorithm of polynomials, one can find two polynomials q(A) and r(A) such 
that 

P(A) =A) FA) + rA), 


where the degree of r(A) less than that of f(A). Then 


p(A) = r(A). 


Thus the problem of evaluating a polynomial of an n x n matrix A, in 
particular A*, can be reduced to the problem of evaluating a polynomial of 
degree less than n. 


2 1 
is f(A) = A? — 2A — 3. Let p(A) = Af — 743 — 3A? + A + 4 be a polynomial. 
A straightforward calculation shows that 


1 2 
Example 8.16 The characteristic polynomial of the matrix A = | | 


p(A) = (A? — 5A — 10) f (A) — 34A — 26. 


8.4. Cayley-Hamilton theorem 355 


Therefore 


p(A) = (A?—5A+4+10)f(A) — 34A — 26I 
= —34A — 26I 


1 2 1 0 —60 —68 
k -34| 5 | 26 4 "| z ee Sale 


1 01 
Problem 8.10 For the matrix A = | 0 2 0 | , evaluate the matrix polynomial 
0 0 2 


AS FIAI 4+ A? — A? 44A4+ 6]. 


8.4.3 Applications to computations of A* and e^ II 


Once we have the Cayley-Hamilton Theorem, we can use this to compute 
AF or e^ without finding the Jordan canonical form of A or a transition 
matrix Q. Let f(A) =A" + an—1A”71 +--+ aià + a9 be the characteristic 
polynomial of A. Then f(A) = 0 implies that 


A= ae At 2 ae ajA = aol. 


Thus, for any k > n, the power A* can be computed by a matrix polynomial 
of degree less than n. This fact implies that the computation of e^ = 
a ae a an infinite series of powers of A, can also be reduced to that of a 
matrix polynomial g(A) of degree at most n — 1. However, the computation 
of the coefficients of g might not be easy. This problem can be handled 
if we consider a matrix polynomial g(A) of the smallest degree such that 
g(A) = 0. If we require the coefficient of the highest power of A in g to be 1, 
such a polynomial exists uniquely, and is called the minimal polynomial 
of A. By the Cayley-Hamilton Theorem, the degree of g is less than or equal 
to that of f. Moreover, by the definition of the minimal polynomial g, it 
is easy to see that g(A) is a factor of f(A). The followings are well known 
facts: 


(1) If the eigenvalues of A are all distinct, the minimal polynomial is the 
characteristic polynomial: 


GA) = F(A) = (A= Ai) ++ (A= An), 


where 4,’s are all distinct n eigenvalues of A. 


356 Chapter 8. Jordan Canonical Forms 


(2) For each eigenvalue A; with multiplicity m; > 1, g(A) has a factor 
(A—A;)"* of highest power p; for A;, where p; is the order of the largest 
Jordan block among the Jordan blocks with A; on the diagonal. Thus, 
if dim Fy, = mj: t.e., A; has m; linearly independent eigenvectors so 
that p; = 1, then (A— j) is a factor of the highest power in A; of g()). 
In particular, if A is diaginalizable, then the minimal polynomial is 
g(A) = (A — Ai) +++ (A — As), where A;’s are distinct eigenvalues of A. 


(3) In general, if A has s distinct eigenvalues A1,..., As with multiplicities 
M1,...,mMgs, respectively, and 1 < pj < mj, i = 1,...,s, denote the 
orders of the largest Jordan blocks belonging to A;, then the minimal 
polynomial is 

(A) = (A= A1)” + (A= As)” 


so that the degree of g(A) is q = $j] pi < n. 


Lat g(A) = A1 + ag—1A97! +--+ a1 + ao be the minimal polynomial of 
A. Then g(A) = 0 implies that, for any k > q, we may set up 


AP = to aA ese Eq1At, or 


e^ = ek aA sees yp Ar 


for some coefficients x;’s. We can now determine those coefficients x;’s as 
follows: Let {v1,...,Vp} be a maximal chain of generalized eigenvectors 
belonging to an eigenvalue A. Then 


Avy = ÀVp + Vp-1, <- AVg = ÀV2 + V1, Avi = Avi. 


From the last equation, Afv; = \*vy = (apf + z1 A +--+ yt AT vi. 
Thus, we get 
Da = £o + AL +... + ATga. 


Similarly from the second to the last equation, A*vz = A*vy + kA*-!v,. On 
the other hand, by a direct computation we get: 


Afv = (aol +r A+: + tqg-1 At ')ve 
= (#9 +Ag, + vee + AT l ¢9-1)V2 
(zı 2AL2 Hasy (q 1)A%-? 9-1) V1- 


Since v2 and vy are linearly independent, we have 


AE = ag + At tAr +A AT tgga 
kaf! = lay +2Az2 +--+ + (q — LAT? 24-1. 


8.4. Cayley-Hamilton theorem 307 


Note that the second equation is just the derivative of the first one with 


respect to A. By the same computations with A’v3,... ,A*v, one gets the 
following additional p — 2 equations: 
BV gs 2.1 3-9 @=1\ 55 
(5) A = D 7? + ay A73 Aia ( 2 AT Stg- 
k \u-o- -D p! eee 
JE OAN eat ESI a a ey pia ATP a. 
ay (prt pyre Tp = 


Note also that these equations are the higher derivatives of the first equation. 
Thus, for each eigenvalue à of multiplicity m, we get a p equations, and so, 
all together, a system of q = )>}_, pi equations in q unknowns z,’s. It is 
easy to see that the coefficient matrix is invertible so that the unknowns 
£0, £1,+++,Lg—1 are uniquely determined. 

The same computation can be applied to 


e^ = tol + a A+-+-+aq-1At! = h(A). 


For a maximal chain of generalized eigenvectors {v1,...,Vp} belonging to 
an eigenvalue A, by computing both sides of 


ev; = (aol + aA +... + zq-141')vj, 
for j = 1,...,p, separately, we get the following p equations 


e = h(A) = zo +Az1 +: ATIE 
eè = W(A)= mtl- 1A? tg 


1 -—1 =] 


With the same argument as the case of A*, one can determine the coefficients 
zj’s uniquely. 


Example 8.17 Compute A* and e^ for 


ON NF 
wo 
E 


358 Chapter 8. Jordan Canonical Forms 


Solution: The characteristic polynomial of A is f(A) = (A —2)*. Since, by 
a direct computation, (A—2/)? = 0 and (A—2/)? £ 0, the Jordan canonical 
form J of A must be of the form 


Thus the minimal polynomial of A is g(A) = (A — 2)3. Thus we can set 
AF = toI + z1 A + 2A?, and for \ = 2, 


one CO 
| Oe eon an) 


II 
oe 
oo 
Cone 


ge = to +24, + 2? ro, 
kok! = zı + 2- 2x9, 
k(k = 1)272 = 2x3. 


The solution is 
zo = 2*"(k-—1)(k—2), 
ry k(2 — k)2*-1, 
r2 = hha), 


so that 


AE SOF ha DRS RKO Se AS k(k — 1)2573 A? 
ka- L ROR’ kma (Ikk) k2k=1 ] 


M 0 2r 2" 0 
E 0 0 of 0 : 
—k2571 ROR” hoe hee oe 
Similarly, for ef = aol + 241A + 22A?, and for À = 2, 
e = to +244 + 2? x0 
e = 21 +2-2x9 
e = 2x9 
The solution is zo = e”, z1 = —e?, x2 = ie, so that 
0121 
1 012 0 
A _ 3207 _ =e eee: 
er Se A+ 5A’) e 0010 
-1 1 52 


8.5. 


8.5 


8.1. 


8.2. 


8.3. 


8.4. 


8.5. 


8.6. 


8.7. 


8.8. 


8.9. 


Exercises 359 


Exercises 
Show that if A nonsingular, then AT! has the same block structure in its 
Jordan canonical form as A does. 


Find the number of linearly independent eigenvectors for each of the following 
matrices: 


1 10 0 0 200 0 0 210 0 0 
0110 0 02 0 0 0 02 0 0 0 
(1) |} 0 0 1 0 04), (2)]/ 0 0 2 0 04], (3) | 0 0 3 0 0 
000 3 1 000 5 1 00 0 3 0 
000 0 8 000 0 5 000 0 5 
Solve the system of linear equations 
(l-i)ja + (1+iy = 2-i 
(+i) + (l+i)y = 1431. 
3 1 2 
Let A=| i s | and vo=| 9 |. 
(1) Solve yn = Ayn-ı with yo. 
(2) Solve y’ = Ay with y(0) = yo. 
—6 24 8 2 
Let A= | —1 8 4 | andyo=] 1 
2 —12 —6 0 
(1) Solve yn = Ayn—1 with yo. 
(2) Solve y’ = Ay with y(1) = yo. 
Solve the initial value problem 
y = =n +2ys, yi(0)= -2 
Yo = 2y +y2 —2ys, y2(0)= 0 
y, = —2y +3y3, y3(0) = —1. 
Find the Jordan-canonical form for A = | ; : | , and compute e^. 


Consider a 2 x 2 matrix A = | A : | 


(1) Find a necessary and sufficient condition for A to be diagonalizable. 
(2) The characteristic polynomial for A is f(t) = t? — (a+ d)t + (ad — bc). 
Show that f(A) = 0. 


For each of the following matrices, find a polynomial of which the matrix is 
a root. 


OleOle AO 03 


360 


Chapter 8. Jordan Canonical Forms 


. Verify that each of the matrices below satisfies its own characteristic polyno- 


mial and from these results compute AT}, if it exists. 


1 0 1 

Bea a @]o2 0 

0 0 2 

. For f(x) = 32° + z? — 2x + 3, compute f(A) for 
1 2 0 1 -l 2 
Cees MO. @BA=|0 2 -1 
00 5 0 0 3 


. Show that a Jordan block J is similar to its transpose, JT = P~!JP, by the 


permutation matrix P = [en --- e1]. Deduce that every matrix is similar to 
its transpose. 


. For any square matrix A = [a,;], define ||A|| = max{|a;jj] : 1 < i,j <n}. 


Prove the followings for any square matrices A, B and c € C: 


(1) ||Al| > 0 and equality holds if and only if A is the zero matrix. 
(2) ||cAl] = Jell] All. 

(3) ||A+Bl| < |All] + I|B||. 

(4) ABI] < nlj A||- || BI). 


. Let A be a stochastic matrix, and Q~!AQ = J be the Jordan canonical form 


of A. 
) Prove ||A*|| < 1 for any positive integer k. 
(2) Deduce that {||J*|| : k =1,2,...} is bounded. 
) Prove that each Jordan block belonging to the eigenvalue \ = 1 of A is 
1x i. 
(4) Prove that jim A" exists if and only if any eigenvalue AÀ of A such that 


00 
|A| = 1 is in fact à = 1. 


. Compute A* and e^ for 


1 1 1 0 

: 1 0 1 
Oe e MOO Tee Oa a 
1 4 001 0 01 1 
0 0 0 2 


. Determine whether the following statements are true or false, in general, and 


justify your answers. 

(1) Any square matrix is similar to a triangular matrix. 

(2) If a matrix A has exactly k linearly independent eigenvectors, then the 
Jordan canonical form of A has k Jordan blocks. 

(3) Ifa matrix A has k distinct eigenvalues, then the Jordan canonical form 
of A has k Jordan blocks. 

(4) If a 4x4 matrix A has eigenvalues 1 and 2, each of multiplicity 2, such 
that dim E; = 2 and dim E» = 1, then the Jordan canonical form of A 
has three Jordan blocks. 


8.5. Exercises 361 


(5) If A1,...,Ax% are k distinct eigenvalues of A with multiplicities m; and 
dim Fy; 4 m,, then A is not diagonalizable. 


(6) For any Jordan block J with eigenvalue A, det e7 = e^. 


(7) If f(z) is a polynomial and A is a square matrix such that f(A) = 0, 
then f(x) is a multiple of the characteristic polynomial of A. 


Chapter 9 


Quadratic Forms 


9.1 Introduction 


So far we have studied the systems of linear equations Ax = b, each of 
whose equations is actually a (homogeneous) polynomial of degree 1 in n 
real variables: ax = a,x, + aoto +--+: + anzn = b. The (homogeneous) 
polynomials of degree 2 in several variables, called quadratic equations, also 
play very important roles in mathematics and in applications, which can 
be also expressed in terms of symmetric matrices, and so the problem is 
reduced to studying symmetric matrices. 

For example, a quadratic equation in two variables z and y is an 


equation of the form 


az? + 2bry + cy? + dx +ey + f =0, 


in which the left side consists of a constant term f, a linear form dx + ey, 
and a quadratic form ax? + 2bry + cy?. Note that the linear form and the 
quadratic form may be written as 


dr +ey = |d ai A 
az? + 2bry +cy? = [2 vils As 2a a 


where 


e=] [i] ma asfi t] 


Quadratic equations arise in a variety of applications, including geome- 
try, number theory, vibrations of mechanical systems, statistics, and electri- 
cal engineering, etc. For instance, as we have seen in the calculus courses, 


363 


364 Chapter 9. Quadratic Forms 


many problems were about finding the maximum or the minimum of a func- 
tion of several variables, and one can usually use the first and the second 
derivative tests for such kind of problems. In fact, the second derivative test 
may be phrased in terms of a quadratic form. We will discuss about this 
later in Section 9.5. 

In general. the quadratic equations are defined in the complex field C 
and the complex vector space C”. However, when the equations are in the 
real field R, the terminologies are simply rephrased as like transpose / or 
symmetry instead of the Hermite # or Hermitian. 


Definition 9.1 An equation of the form 


n n n 
f(x) = X X ayTizj;+ Y bizi+c, 
i=1 j=l i=1 
= x! Ax+b”x+c=0, 
where A = [aj;] is an n x n complex matrix, x = [£1 --- £n", b = 
[b] -+ bn]! € Œ and c €C is called a quadratic equation in n variables 


T1, T23... 3 Ene 
The first part of the right side: 
Tı n n 
q(x) =x" Axer --- ma | 2 E > 0 amie; 


Ln i=1 j=1 


is called a quadratic form on C” which is a homogeneous polynomial of 
degree 2 in n variables x1, £2,..., £n- 


The second part hi 
b” x = `> biti 
i=1 


is called a linear form on C” which is a homogeneous polynomial of degree 
1 in n variables z1, £2,..., £n- 


Remark: (1) A quadratic equation is said to be consistent if it has a 
solution, i.e., there is a vector x € C” such that f(x) = 0. Otherwise, it is 
said to be inconsistent. For instance, the equation 2%? + 3y? = —1 in R 
is inconsistent. In the following discussion we will consider only consistent 
equations. 

(2) The matrix A in a quadratic form can actually be restricted to be 
a Hermitian matrix in the complex case, or a symmetric matrix in the real 
case. 


9.1. Introduction 365 


In fact, any real square matrix A can be expressed as the sum of a 
symmetric matrix B and a skew-symmetric matrix C, say 


A=B+C, where B= 5(A+ A’) and C= (A = Al), 
For the skew-symmetric matrix C, we have 
x' Cx = (x! Cx)" = x'Clx = —x" Cx. 
Hence, as a real number, xt? Cx = 0. Therefore, 


q(x) =x! Ax = xT (B + C)x =x! Bx. 


This means that, without loss of generality, one may assume that the matriz 
A in a real quadratic form is a symmetric matriz. 

For a complex quadratic form A, we can write it as A = B+iC for some 
Hermitian matrices B and C (see page 296). Hence, for any x € C”, 


x! Ax = x” (B + iC)x = x" Bx + ix” Cx, 


in which x# Bx and x” CXx are real numbers by Theorem 7.1. Hence, a 
complex quadratic form A on C” may be assumed to be a Hermitian matriz. 
Thus x" Ax is always a real number for any x € Œ. 


In the real field, the solution set of a consistent quadratic equation 
f(x) = x’ Ax + bfx +c = 0 is a level surface in R”, that is, a curved 
surface that can be parameterized in n — 1 variables. In particular, if n = 2, 
the solution set of a quadratic equation is called a quadratic curve, or more 
commonly a conic section. When n > 3, it is called a quadratic surface, 
which is either one of the three types: a paraboloid, an ellipsoid, or a 
hyperboloid. 

Our main concern in this chapter is the study of the graph, or the solu- 
tions, of such a quadratic equation. 


Example 9.1 The standard three types of conic sections: 


L 
(1) (circle or ellipse) 4 + 4 = 1 with A = | 0 k | 
b2 

2 2 


(2) (hyperbola) 4 — %& = 1 or = + 5 =] with 


366 Chapter 9. Quadratic Forms 


Y 
N 
~ 
S 
S 
S 
` 
` 
N 
7 
r 
ld 
va 
7 
4 
a, 
a, 
a, 


Figure 9.1: Ellipses and Hyperbolas 


Figure 9.2: Parabolas 


Problem 9.1 Find the symmetric matrices representing the quadratic forms 
(1) 9x? — 23 + 423 + 62122 — 82123 + 22073, 
(2) L1X2 + L123 + L243, 


(3) £? +23 — z} — v3 + 2axyx2 — 102124 + 40324. 


9.2 Diagonalization of a quadratic form 


Primarily, we are concerned about the types of the quadratic surfaces in IR” 
given by: f(x) = x’ Ax+b!’x+c=0. We will see that, in this problem, 
the signs of the eigenvalues of A play very important roles. 

We first consider a general case of b 4 0, that is, with a linear form 
present. 

If the matrix A is invertible (with no zero eigenvalues), then, by taking a 
change of variables like a translation: y = x + 5A7'b (orx=y— 5A~'b), 
the given quadratic equation is transformed into a new quadratic equation 
y! Ay = d without a linear form, where d = c+ gb! A~'b. This case will 
be discussed further in the following. 


Example 9.2 Determine the conic section for 3x? —6xy+4y? +2x—2y = 0. 


9.2. Diagonalization of a quadratic form 367 


Solution: For the quadratic form 3x? — 6zy + 4y?, A = | E E | . It is 
invertible and 
1/4 3 2 
sio S = 
A =5/5 3) and b beak 
With the change of variables y = x + 5A7'b, that is 7’ = z + , y =y, 
the equation is transformed to a new equation 3(z')? — 6x'y' + 4(y’)? = $. 


Clearly, the matrix representation of the new quadratic form is also A, and 
so the conic section is determined by A. See Example 9.9 below. 


However, if A is not invertible (with some zero eigenvalues), then the 
linear form can not be eliminated by any method of change of variables, 
and so the type of quadratic surface is a parabolic type. For example, the 
equation z? — z = c has a singular quadratic form with a nonzero linear 
form that cannot be removed by any change of variables. The type of this 
quadratic surface is a parabolid when n = 3. 


Example 9.3 (Paraboloids) z + g = = (elliptic) or z — i =,c>0 
(hyperbolic) with 


+ 0 0 az 0 0 
se| F fol wasl E ao 


Figure 9.3: Elliptic paraboloid and Hyperbolic paraboloid. 


In general, if a linear form is present, the types of the quadratic surfaces 
are elliptic, hyperbolic, or cylinderical paraboloids, depending on the signs 
of eigenvalues of the quadratic form A as discussed in the following. 


368 Chapter 9. Quadratic Forms 


Hence, we may now consider the quadratic part q(x) = x! Ax only. This 
may be rewritten as follow: for x = (z1,...,27) E R, 


n 
q(x) = xl Ax = pD Ayr? +2 y QijLitj, 
izi i<j 


where the first part X>; aia? is called the (perfect) square terms and 
the second part bee aij£i£j is called the cross-product terms. Usually, 
the presence of the cross-product terms makes it hard to determine the type 
of the quadratic surface of q(x) = c. 

However, since the matrix A is symmetric, it can be diagonalized by a 
basis of orthonormal eigenvectors: i.e., for an orthogonal matrix P whose 
columns are orthonormal eigenvectors of A, 


Pe, © 


PAP = PAP = D = i i 
lo | 


where the diagonal entries \;’s are the eigenvalues of A. Now, by taking a 
change of variables as x = Py, the quadratic form q(x) is written as 


q(x) = x! Ax = y! (P' AP)y = y" Dy = Ay? + doy> ++ haa. 


which is now a quadratic form without the cross-product terms. 

The same is true for a complex quadratic form q(x) = x” Ax with a 
Hermitian matrix A, since every Hermitian matrix A is unitarily diagonal- 
izable: A = UDU® for a real diagonal matrix D and a unitary matrix U. 
Hence, by a change of variables x = Uy, the quadratic form q(x) = y# Dy 
has only square terms. 

In either case of real or complex, we consequently have the following 
theorem. 


Theorem 9.1 (The principal axes theorem) Let q(x) = x" Ax be a 
complex quadratic form on C” with a Hermitian matrix A, and U a unitary 
matriz that diagonalizes A: U4 AU = D. Then, for a change of coordinates 
x = Uy with y = [y1 yo «++ ynj|’, we have 


x” Ax = y” Dy = Mui? + Aalyel? +--+ + Anlynl? E€ R. 


For a real quadratic form q(x) = x! Ax with a symmetric matrix A, 
we take the orthogonal diagonalization PAP = D of A and a change of 
coordinates x = Py. 


9.2. Diagonalization of a quadratic form 369 


Clearly, the columns of the matrix U (or P, respectively) in Theorem 
9.1 form a basis for C” (for R”, respectively) that consists of orthonormal 
eigenvectors of A. These eigenvectors are called the principal axes of the 


quadratic form. The vector y is just the coordinate change of x with respect 
to the principal axes. 


Example 9.4 The standard four types of quadratic surfaces: 


+ 0 0 
(1) (ellipsoids) z + ye + z = ] with A= 0 $ 0 
0 0 + 

4 0 0 

(2) (cones) 2% + 1% — & = 0 with A= 0 0 


Figure 9.5: Hyperboloid of one sheet and Hyperboloid of two sheets 


2 2 


(3) (hyperboloids of one or two sheets) x +$-4=1 (of one sheet) or 


b c2 


370 Chapter 9. Quadratic Forms 


—5 -—£+4 = 1 (of two sheets) with 
+ 0 0 -4 0 0 
ao ae et 


Example 9.5 Consider a quadratic equation in two variables on R? : 


ax? + 2bxy + cy? + dz + ey + f =0. 


Or, in matrix form 
x’ Ax +b’x+ f =0, 


with the symmetric matrix A = E | b= | J and x = | . | in R. 


(1) If b = 0, then A is already a diagonal matrix with the eigenvalues a 
and c, and the equation becomes 


(i) Ifa =0 =c (i.e., A = 0), then the conic section is a line in the plane. 


(ii) If a # 0 = c, then it is a parabola when e Æ 0, or one or two lines 
when e = 0. 


(iii) Ifa # 0 Æ c, then the quadratic equation becomes 


ar? + cy? +dx +ey+ f =a(a—p)*?+cly—g)? +h=0 


for some constants p, q, and h. If h = 0, the cases are easily classified 
(try). Suppose h # 0. Then, the conic section is a circle if a = c, an 
ellipse if ac > 0, or a hyperbola if ac < 0. 


(2) Suppose that b 4 0. Then A can be diagonalized as P! AP = D, 
where P is an orthogonal matrix of the orthonormal eigenvectors, and D 
is the diagonal matrix of eigenvalues. A; and Ag. By a coordinate change 
x = Py, the quadratic equation becomes 


ax? +2bry +cy2+dz+ey+f =r +y? +d'r' + ey +f = 0, 


where A; and 2 are the eigenvalues and d' and e’ are some constants. Hence, 
the classification of the conic sections is reduced to the case (1) according to 
the various possible cases of the eigenvalues of A, and the principal axes of 


9.2. Diagonalization of a quadratic form 371 


the conic section are the directions of the eigenvectors, which are orthogonal 
to each other. 

(3) Note that the eigenvalues can be expressed by the coefficients a, b, 
and c as 


2 PRT 2 


Hence the conic section maybe classified according to the various possible 
cases a, b, and c (see Exercise 9.4). Moreover, the principal can also be 
found from the the coefficients a, b, and c as follows: The eigenvectors are 
the solutions of (a — A)a + by = 0: for i = 1,2, 


a Eanes I 


Since the eigenvectors are orthogonal to each other, it is good enough to 
find only one of them, and the slope of vı from the x-axis is 


—(a— 4) ae a—c\? 
E E =| 4a. 
b om Vio) 


Since b Æ 0, we define 


> = cot 20, for some 0 < 20< 7. 


Then we have SUN = — cot 20 + cosec 20 = tan0, which means that vı 


is the rotation of e; by the angle 0 which is defined above. Hence, 


Pam Wwe | cos@ —sin0 |. 


sin cos 0 
and d’, e' in the above equation are given by 


d = dcos@+e sind 


e = —d sinb +e cos@. 


Example 9.6 Determine the conic section 3x2? + 2xy + 3y? — 8 = 0 on R. 


Solution: In matrix form, it is 


te wl [t 3] [5 ]=8 


372 Chapter 9. Quadratic Forms 


The eigenvalues of A with a= 3 = c and b= 1 are 


ee +c) +/(a—c)? 4 AO sj 


2 ’ 


Ag = 2. 


Moreover, 0 = Z(e1, v1) should satisfy 


a—c 


Teas 


cot 26 = 


Thus 20 = 5 or 0 = 4, and so the transition matrix is 


= = cos + —sint o 1l 1 —1 
Pey aa mi l= j 


By the change of coordinates, x = Py, 
3x? + 2zy + 3y? = x!’ Ax = y! PT APy 
4 0 
ay | A |» = 4(x')? + 2(y')? = 8, 


or 


which represents an ellipse with axes vı and vo. 


Problem 9.2 Find an invertible matrix P such that P’ AP is diagonal for each 
of the following symmetric matrices: 


0 1 -1 1 -3 1 0 
ma] 1 1 0] @a=[-3 4 2] a=] 
-1 0 2 1 2 5 1 


Problem 9.3 Sketch the graph of each of the following quadratic equations: 
(1) 2x? + 2y? + 6yz + 102° = 9; 
(2) x? — 8ry + 16y? — 32? = 8; 
(3) 4a? + 12ry + 9y? + 32 — 4 =0. 


9.3. Congruence relations 373 


9.3 Congruence relations 


From Theorem 9.1, the geometric shape of a quadratic surface in R” which 
is the level surface of a quadratic equation f(x) = x’ Ax + bfx + c = 0 is 
completely determined by the signs of the eigenvalues of A. 

Indeed, if x; is an eigenvector belonging to a positive eigenvalue A; and 
Xə is an eigenvector belonging to a negative eigenvalue A> of A, then 


q(xı) = x! Axı = dy xt x, = àilxll? > 0, 


q(x2) = x# Axo = rox! xo = do||x21|? < 0. 


That is, the values of q are positive or negative depending on the signs of 
eigenvalues. Therefore, the numbers of positive, negative and zero eigenval- 
ues of A play important roles for the shape of the quadratic surfaces. 

However, the diagonalization of A in Theorem 9.1 is through P# AP = 
D, not the similarity relation P~!AP = D which is a unique expression, 
and unlike the similarity relation, this diagonalization, called a congruence 
relation, is not unique, and in fact there are many different ways of diago- 
nalization of a quadratic form A through the congruence relation. 

For instance, a symmetric matrix A may be congruent to many different 
diagonal matrices (not necessarily by orthogonal matrices). Indeed, A is 
congruent to a diagonal matrix D by an orthogonal matrix P: P# AP = D. 
Then the matrix Q = kP, k £0, also diagonalizes A to a different diagonal 
matrix via a congruence relation: 


QF AQ = (kP)# A(kP) = k? PY AP = k? D, 


which is also diagonal with diagonal entries k?,, k?A9, ..., k?An. In this 
case, if k # +1, Q is not an orthogonal matrix and the resulting diagonal 
entries are not the eigenvalues of A anymore. 

In general, let A be a Hermitian matrix. As we know already, a change 
of variables (or a basis) in C” is expressed by [x]a = P[x]g where [x], and 
[x] are two coordinate representations of the same vector x in Œ with 
respect to two bases a = {e€1, e2, ..., en} and 8 = {e}, ef, ..., ef} 
for C”, respectively, and P = [id] is the transition matrix from £ to a. 
Thus, by setting x = [x], and y = [x]g so that x = Py, a Hermitian form 
q(x) =x" Ax can be rewritten as 


q(x) = x" Ax = (Py)" A(Py) = y” (P" AP)y = y” By, 


where B = P# AP and y" By is the expression of q(x) = x" Ax in a new 
basis (or a new coordinate system) £. 


374 Chapter 9. Quadratic Forms 


Definition 9.2 Two complex n x n matrices A and B are said to be con- 
gruent if there exists an invertible complex matrix P such that P# AP = B. 


It is easily seen that the congruence relation is an equivalence relation 
in the vector space Mnxn(C). The above argument shows that any change 
of bases gives rise to congruence expressions of a quadratic form and so two 
congruent matrices give rise to the same quadratic form, and the expression 
of a Hermitian form on Œ is independent of the choice of bases. 

For a practical method of diagonalization of a Hermitian matrix A through 
the congruence, one can also use the forward elimination to A: 

In fact, each step in the forward elimination on A is a multiplication of an 
elementary matrix E of type 3 to A on the left side, and the multiplication 
of E# to A on the right side means the same operation on the column of A. 
Since A is Hermitian, HAE” produces again a Hermitian matrix with two 
zeros entries on the symmetric positions: For instance, 


P AEP 
| 1 0 0 Q11 @21 an1 | | 1 Ere 0 | 
os eee a21 422 0 1 0 
0 ance 1 an1 eee Ci 0 EA 1 
ayy 0 431) ām 
0 ay 
= , provided ay, Æ 0. 
a31 
Anl woe ann 


Thus, by inductive process, we get PAP? = Q4¥ AQ = D where P = 
Eņ--- EB )E, = Q# is an invertible lower triangular matrix as a product of 
elementary matrices and D is the diagonal matrix of the pivots. Equiva- 
lently, 

A=LDL", or QUAQ=D 


with Q4 = Lt = P. This process can be summarized as 


[A | J] [E AE? | EI] > [E,.E, AE? EF | E)E\I] > -+ 


> 
> [E,---E, AEE... EF | Ep- E,I] = [D | P]. 


The following example shows how to diagonalize a real symmetric matrix 
A by the forward elimination. 


9.3. Congruence relations 375 


Example 9.7 Consider a symmetric matrix 


a-[ios| 


| 2 3 6 | 


The forward elimination produces 


1 1 2|100 
[A| = |1 03j010 
2 36|0 0 1 
1 0| 100 
— [PREAB ED | REI] = |0 -1 1|-1 1 0 
0 2|/-2 01 
1 0 0] 10 0 
— [Bs B ABETE] ES | E3E,E,I| = | 0 —1 0|-1 1 0 
0 0 3|-3 11 
= [D| PI, 
where 
0 0 1 0 0 10 0 
Fy=|-110/],H=| 010], B=)]01 0 
0 1 —2 0 1 011 
Hence, 
1 00 10 0 
PTAP=D=|0 -1 0|, P=|—-1 10 
0 0 3 -3 1 1 


Note that each of the many different diagonalization by a congruence 
relation maybe used to produce the principal axes Theorem 9.1, and the 
geometric shape of a quadratic surface depends on the number of the signs 
of the diagonal entries. Now a question one might ask is whether or not the 
signs of the diagonal entries depend on the diagonalizations of A. For this, 
the Sylvester’s law of inertia Theorem 9.15, which will be proven in Section 
9.6, says that the numbers of positive, negative and zero diagonal entries, 
respectively, of two congruent diagonal matrices are the same no matter 
what the diagonalizing matrices are. That is, those numbers of sings of the 
diagonal entries are invariants under the congruence relation. Therefore, we 
can have the following definition: 


376 Chapter 9. Quadratic Forms 


Definition 9.3 The inertia of a Hermitian matrix A is a triple of integers 
denoted by In(A) = (p,q, k), where p, q and k are the numbers of positive, 
negative and zero eigenvalues of A (or diagonal entries of D), respectively. 


Thus, the geometric type of the quadratic surface in R” is determined by 
the inertia In(A) in the following sense. If In(A) = (p,q, k), then In(—A) = 
(q, p, k) and the equation x’ Ax = c is inconsistent when p = 0 and c > 0. 
Hence, it suffices to consider the cases of c > 0 and p > 0. Excluding those 
inconsistent cases, we have the following characterization of the solution sets 
for n = 2 and 3: 


For n = 2, there are only three possible cases for In(A): 
The solution of x* Ax = c 
a point 


two lines crossing at 0 
a line 


ellipsoid a point 

one-sheeted hyperboloid | elliptic cone 

two-sheeted hyperboloid | elliptic cone 

elliptic cylinder a line 

hyperbolic cylinder two planes crossing in a line 
two parallel planes a plane 


In general, In(A) will have n(n + 1)/2 possibilities, each characterizing 
a different geometric type of a quadratic form. For example, if In(A) = 
(n,0,0) and c > 0, then the quadratic form describes an ellipsoid in R”, etc. 


Example 9.8 In Example 9.7, the diagonal entries of D are 1,—1 and 3. 
Thus In(A) = (2,1,0), and one can check that A = Q? DQ = LDL" with 
QT = L = P7! by a direct computation. 


Example 9.9 The eigenvalues of A in Example 9.2 are 5(7 + v37). There- 
fore, In(A) = (2,0,0) and the conic section is an ellipse. 


Example 9.10 Determine the quadratic surface for 2xy + 2x2z = 1 on R. 


9.3. Congruence relations 377 


0 1 1 
Solution: The matrix for the given quadratic form is A= | 1 0 0J, 
100 


and the eigenvalues of A can be found to be A; = V2, Ap = —V2, A3 = 0, 
with associated orthonormal eigenvectors 


(= 1 5) ( 1 1 5) d (o 1 =) 
MFT (m5. 5)]> V2-| 77 b> 5] > an v3 = 3 ar Afa 
oN ay. Je 29 3 va 2 
respectively. Hence, an orthogonal matrix P that diagonalizes A is 


1 V2 -V2 0 
P= 1 1 -v2 l, 
1 1 4/2 


and with the change of coordinates x = Py, that is, 


1 1 
—y'), y= 5 (a! + y' — v2’), a= ala! +y' + v2"), 


the equation is transformed to V2(x')? — V2(y’)? = 1, which is a hyperbolic 
cylinder as shown in Figure 9.6. Note that In(A) = (1,1,1). 


Figure 9.6: Hyperbolic cylinder 


For some particular values of p, q and k, the quadratic form x” Ax, or 
A, has spacial names as follows: 


Definition 9.4 Let A = [aij] E Mnxn(C) be a Hermitian matrix, and let 
xX = (T1, T2, ..., Zn) E Œ. Then, the matrix A, or a quadratic form 
x" Ax, is said to be 


(1) positive definite if x” Ax =); , 
(2) positive semidefinite if x Ax = > 
(3) negative definite if x” Ax = > 


ajjX;x; > 0 for all nonzero x, 


F aijZizj > 0 for all x, 


ij Mj LiL] < 0 for all nonzero x, 


378 


Chapter 9. Quadratic Forms 


(4) negative semidefinite if x” Ax = Dig GijLit; < 0 for all x, 


(5) indefinite if x” Ax takes both positive and negative values. 


2-1 0 
For example, the real symmetric matrix | —1 2 —1 | is positive 
0 -1 2 


definite, because the quadratic form satisfies 


x! Ax 


2 -1 0 Ly 


[xy T2 z3] —1 2 —1 T2 
0 -1 2 T3 
22] — T2 
[xy T2 z3] 0456 2x9 cae Oe 
—%2 + 273 


zı (2x1 — £2) + £o(—21 + 2xQ — 13) + 43(—H2 + 223) 
2x7 — 20109 + 223 — 2x9n3 + 2x3 


r? + (£1 x2)" + (x9 £3)" 13 >0 


unless z1 = £2 = £3 = 0. 


Depending on the inertia, we have the following characterizations from 


the principal axes theorem. 


Corollary 9.2 A Hermitian matriz A is 


(1) positive definite if and only if all the eigenvalues of A are positive, 


(2) 


(3) 
(4) 


(5) 


positive semidefinite if and only if all the eigenvalues of A are non- 
negative, 


negative definite if and only if all the eigenvalues of A are negative, 


negative semidefinite if if and only if all the eigenvalues of A are 


nonpositive, 


indefinite if and only if A takes both positive and negative eigenvalues. 


Remark: Thus, for a quadratic form A with In(A) = (p,q, k), A is 
positive definite if p = n and so q = 0 = k, 
positive semidefinite if p # 0 and q = 0 Æ k, 
negative definite if q = n and so p = 0 = k, 
negative semidefinite if q # 0 and p = 0 Æ k, 
indefinite if pq A 0 and k = 0, 
and semi-indefinite if pgk Æ 0. 


9.4. Characterization of quadratic forms 379 


Example 9.11 For q(z,y) = 2x? — 4xy + 5y?, determine the nature of the 
critical point (0, 0). 


Solution: The matrix of the quadratic form is A = 2 5 


(I) Solve det(AJ— A) = A? —7A+6 = (A—6)(A—1) = 0 to get eigenvalues 
A, = 6 and Ag = 1. Since both eigenvalues are positive, A is positive definite 
and hence (0, 0) is a global minimum. 

(II) By the forward elimination, we get 


iio r_ f2 0 
a=| 3 A ae east =| 4 § |. 


where =| | ; 


values (see Theorem 9.3). Thus In(A) = (2,0,0) and A is positive definite 
and hence (0, 0) is a global minimum. 


| . since the pivots are all positive, and so are the eigen- 


Problem 9.4 For each A of the following Hermitian matrices, find an invertible 
matrix P such that P” AP is diagonal and determine In(A) : 


0 1 i l (eae 1 0 0 
Q)A=] 1 10/,@A=/]14+3i 4 2%1,@8)A=] 0 0 
0 2 0 


—i 1 —2i 5 


9.4 Characterization of quadratic forms 


We here present some practical criteria for definiteness of the matrix A in 
terms of the determinant or pivots. For this, we again look at the quadratic 
form in two real variables, q(x, y) = az? +2bry+cy?, which may be rewritten 
in a complete square form as 


b b? 
q(x) = ax? + 2bzy + cy? =a (- + 2y) + (< = z) y’. 
We see that q is positive definite, i.e., q(x) = x’ Ax > 0 for any nonzero 


vector x = (x, y) € R’, if and only if a > 0 and ac — b? > 0, or equivalently, 
the determinants of 
a b 
fa] and | ioe | 


are positive. 


380 Chapter 9. Quadratic Forms 


A generalization of these conditions will involve all n-submatrices of A, 
called the principal submatrices of A, which are defined as the upper left 
square submatrices 


Q11 Q12 413 
| A3 = Q21 Q22 493 pdas Apm A 


Q31 Q32 433 


With this construction, we have the following characterization of positive 
definite matrices. 


Theorem 9.3 The following are equivalent for a Hermitian matrix A: 
(1) A is positive definite, i.e., x Ax > 0 for all nonzero vectors x; 
(2) all the eigenvalues of A are positive; 

(3) all the principal submatrices Ax’s have positive determinants; 
(4) all the pivots (without row interchanges) are positive; 


(5) there exists a nonsingular lower triangular matriz R with positive diag- 
onal entries such that = RË R. (called a Cholesky decomposition 
or a Cholesky factorization) ; 


Proof: (1) = (2) was shown. 

(2) = (3) If A has positive eigenvalues 41, A2, ..., An, then det A = 
Ay A2+::An > 0. To prove the same result for all the submatrices Az, we 
show that if A is positive definite, so is every Ay. For each k = 1, ..., n, 
consider all the vectors whose last n — k components are zero, say x = 
[v1 «++ ap 0 +++ O}f = [xf 07]", where x; is any vector in Œ*. Then 


x! Ax = [ xa 0# | k. i | EB | = xl! AkXk. 
Since x” Ax > 0 for all nonzero x, x! A,X, > 0 for all nonzero x, € C*; 
that is, Ay’s are positive definite, all eigenvalues of A; are positive, and their 
determinants are positive. 

(3) = (4) Note that the diagonals of a Hermitian matrix are all real. In 
the forward elimination to a Hermitian matrix A = [a;;], the first pivot dı 
is aj, which is positive since aj; = det A; > 0. Since this row operation 
transforms A» into an upper triangular matrix with real diagonals (since 
a12 = G21), and does not change the determinant of Ag, we clearly have 


dıdə = det Ay > 0, 


9.4. Characterization of quadratic forms 381 


which implies dz > 0. Now by induction 


det A; 


Gel e ae 
kT det Azı 


>0, kK=1,...,n. 


Hence, at each step of the forward elimination, the pivot dẹ is positive and 
no row exchanges are necessary. Consequently, we have A = LDL", where 
L! is the forward elimination to A which is nonsingular with 1’s on the 
diagonal, and D is the diagonal matrix of the positive pivots. Note that 
multiplying (L~!)" to the right side of A means the same operations to the 
columns of A. 

(4) = (5) Since the diagonal entries of D in A = LDL? are all positive, 


we can define 
Vd, 0 


0 Va 

Then, clearly det(V D) > 0, D = VDVD and (VD)" = VD. Hence, 

A = LDL” SUN DAO =e 4 (VI VaR, R= VDI”. 
(5) > (1) Let A = RËR for a nonsingular matrix R. Then, for x 4 0 


x” Ax = x" (R” R)x = (Rx)" (Rx) = ||Rx||? > 0, 


because Rx # 0. 


Problem 9.5 Determine which one of the following matrices A and B is positive 
definite. For the positive definite one, find a nonsingular matrix W such that it is 

T 
WW. 2 -1 -1 2 —1 0 
A=| -1 2 -ij, B=|-1 21). 


—-1 -l 2 0 1 2 


Problem 9.6 Let A be a positive definite matrix. Prove that CTAC is also posi- 
tive definite for any nonsingular matrix C. 


Since a Hermitian matrix A is negative definite if and only if —A is 
positive definite, one can get the following theorem from Theorem 9.3. 


Theorem 9.4 The following are equivalent for a Hermitian matriz A: 
(1) A is negative definite, i.e., x Ax <0 for all nonzero vectors x; 


(2) all the eigenvalues of A are negative; 


382 Chapter 9. Quadratic Forms 


(3) the determinants of the principal submatrices Ax’s alternate in sign: 
i.e., det Ay < 0, det Ag > 0, det A3 < 0, and so on; 

(4) all the pivots (without row interchanges) are negative; 

(5) there exists a nonsingular lower triangular matriz R with positive di- 
agonal entries such that = —R" R. 


Problem 9.7 Show that the determinant of a negative definite n x n symmetric 
matrix is positive if n is even and negative if n is odd. 


One can easily establish the following analogous theorem for semidefinite 
matrices. 


Theorem 9.5 The following are equivalent for a Hermitian matrix A: 
(1) A is positive semidefinite, i.e., xX” Ax > 0 for all nonzero vectors x; 
(2) all the eigenvalues of A are nonnegative; 

(3) all the principal submatrices Ax’s have nonnegative determinants; 
(4) all the pivots are nonnegative; 


(5) there exists a nonsingular lower triangular matriz R of rank equal to 
the rank of A =r such that = RË R. 


In (3), a principal submatrix is formed by throwing away any i-th columns 
and i-th rows together. In (5), if the r xr upper left corner is positive definite 
so that we get r positive pivots without row exchanges, we get R = VDL" 


5 dı 0 
j i | and X, = Pa . If we need row exchanges, 


0 dr 
we also exchange the corresponding columns to maintain symmetry. Thus 
forward elimination is to PAP’ for a permutation matrix P, to get the 
above expression. 
R can be obtained from any of the choices, LDL" or diagonalization by 
the eigenvectors: A = (LVD)(VDL"), A=UDU" = (UVD)(VDU®"), or 
A=UDU# = (UVDU#)(UVDU*),. 


with D = | 


Problem 9.8 State the corresponding conditions to the ones in Theorem 9.5 for 
the negative semidefinite forms. 


Problem 9.9 Which of the following matrices are positive definite? negative def- 
inite? indefinite? 
1 1 2 0 0 3 -1 0 
(4) }2 1 14, (2) | 0 5 3], (3) | -1 2 1 
1 1 2 0 3 5 0 13 


9.5. Application to extrema of functions on R” 383 


9.5 Application to extrema of functions on R” 


In calculus, one uses the second derivative test to see whether a given func- 
tion y = f(x) takes a local maximum or a local minimum at a critical point. 
For functions of more than one variables, we have the same test for the local 
extrema expressed in terms of a quadratic form. 

Let f(x) be a real-valued function on R”. A point x9 in R° at which ei- 
ther a first partial derivative of f fails to exist or the first partial derivatives 
of f are all zero is called a critical point of f. If f(x) has either a local max- 
imum or a local minimum at a point xo and all the first partial derivatives 
of f exist at x9, then all of them must be zero, t.e., f2;(Xo) = SL (x0) = 0 
for alli = 1,2, ..., n. Thus, if f(x) has first partial derivatives everywhere, 
its local maxima and minima will occur at the critical points. 

Let us first consider a function of two variables: f(x), x = (x, y) € R? 
with a critical point x9 = (£o, yo) € R?. If f has continuous third partial 
derivatives in a neighborhood of xo, it can be expanded in a Taylor series 
about that point: For x = (ap +h, yo +k), 


f(x) 


f(zo+h, yo +k) 
f (Xo) + (hfc(X0) + kfy(xo)) 


+5 (B? fea (x0) + 2hk fry (xo) + &?fyy(X0)) +R 


f(xo) + (ah? | 2bhk + ck?) + R, 
where 
a = frz(Xo), b= fry(X0), c = fyy(X0), 


and R is the remainder term such that |R] gets smaller than the absolute 
value of 5(ah? + 2bhk + ck?) for sufficiently small h and k. Hence 


f(x) — f (xo) = <(ah? + 2bhk + ck?), 


N| = 


and so they will have the same sign. Note that the expression 


pusatsamsa iat t] [i 


is a quadratic form in the variables h and k, where 


Be a EE) 2S 


384 Chapter 9. Quadratic Forms 


is a symmetric matrix, called the Hessian of f at xo = (xo, yo). Hence, 
f(x, y) has a local minimum (or maximum) at xo if and only if 


q(h, k) =x’ Hx>0 (or <0), forall x= (h,k), 


respectively, where (h, k) are sufficiently small. The critical point xo is 
called a saddle point if q(h, k) takes both positive and negative values. Thus, 
at this point, f(x, y) has neither a local minimum nor a local maximum. 
This is the second derivative test for a local extrema of f(x, y). 


In particular, a quadratic form 


b 
q(x) =x"Ax= |z y] E A |g | =a? + 2y + 07 


for x = |z y]! € R is itself a function of two variables, and its first partial 
derivatives are 


Gx = 2ax + 2by 
dy = 2br + 2cy. 


By setting these equal to zero, we see that 0 = (0, 0) is a critical point of 
q. If ac — b? Æ 0, this will be the only critical point of q. Note the Hessian 
of q is 

a b 


Thus, H, or A, is nonsingular if and only if 5 det H = det A = ac — b? 40. 
Since q(0) = 0, it follows that the quadratic form q takes the global 
minimum (or global maximum) at 0 if and only if 


"=| 


q(x) = x" Ax > 0, (or <0), for all x £0, 


respectively. If x! Ax takes both positive and negative values, then 0 is a 
saddle point. Thus, if A, or H, is nonsingular, the quadratic form q will have 
either the global minimum when A is positive definite, the global maximum 
when A is negative definite, or a saddle point at 0 when A is indefinite. If 
if A, or H, is singular, the the graph of the function looks like a cylinder. 


In general, if a function f of two variables has a nonsingular Hessian H 
at a critical point x9 = (xo, yo) which has eigenvalues A; and Ag so that 


Hy, (h, k) = Mh? + Agk?, 


then the second derivative test for f(x) says 


9.5. Application to extrema of real-valued functions on IR" 385 


(1) f has a minimum at xq if H is positive definite, 
(2) f has a maximum at xo if H is negative definite, 
(3) f has a saddle point at xo if H is indefinite. 


These conditions were abbreviated by the signs of A; and A,Ag: That 
is, f has local minimum (or maximum) at xo if AyA2g > 0 and Ay > 0 (or 
à1à2 > 0 and A; < 0, respectively), x9 is a saddle point if A} àz < 0. 


Example 9.12 Find and describe all the critical points of the function 


1 
F(a, y) = 30° + ay? — doy +1. 


Solution: The first partial derivatives of f are 
fz =£? +y? — 4y, fy=2ry—4e = 2a(y—2). 


Setting fy = 0, we get x = 0 or y = 2. Setting fs = 0, we see that if 
x = 0, then y must be either 0 or 4, and if y = 2, then z = +2. Thus, 
(0, 0), (0, 4), (2, 2), (—2, 2) are the critical points of f. To classify these 
critical points, we compute the second partial derivatives: 


fra = 22, fay = 2y — 4, fyy = 2a. 


At each critical point (£o, yo), one can determine the eigenvalues A; and A2 


of the Hessian 
mt 2x0 2yo — 4 
mE 2yo —4 2x0 


These values are summarized in the following table: 


Critical Point (wo, ve) 
( 


saddle point 
saddle point 
local minimum 
local maximum 


The same argument for the second derivative test for functions of two 
variables can be applied to functions of more than two variables. Let f(x) = 
f(£1,£2, ---, Zn) be a real-valued function whose third partial derivatives 
are all continuous. If xg is a critical point of f, from the Tayler polynomial of 


386 Chapter 9. Quadratic Forms 


degree 2 about the critical point, the difference f (x)— f (xo) is approximately 
equal to the quadratic form [x — xo]! H[x — xo] determined by the Hessian 
of f at x9: the n x n symmetric matrix H = H (xọ) = [hij] given by 
0? f 
hij = —— (x0). 
ý Ox;02; (xo) 


The critical point can be classified as follows: 
(1) f has a local minimum at xo if H(xo) is positive definite, 


(2) f has a local maximum at xo if H(xo) is negative definite, 
(3) xo is a saddle point of f if H (xo) is indefinite. 
Example 9.13 Find the local extrema of the function 


f(z, y, z) =2*+az—-—3cosy+2z’. 


Solution: The first partial derivatives of f are 
fe =22+2, fy=d3siny, f,=2+2z. 


It follows that (x, y, z) is a critical point of f if and only if z = z = 0 and 
y = nr, where n is an integer. Let x9 = (0, 2k7, 0). The Hessian of f at 
Xo is given by 


A he 0 ri 
Aae i 


It can be diagonalized through the congruence relation to get 


201 20 0 
H(x)=|0 30l — [03 0 
102 0 0 3/2 


It shows that In(H (xọ)) = (3,0,0) and H (xo) is positive definite and hence f 
has a local minimum at xo. (Alternatively, one can compute the eigenvalues 
of H (xo) which are 3, 3, and 1, which implies that H (xọ) is positive definite.) 

On the other hand, at a critical point of the form x; = (0, (2k—1)r, 0), 
the Hessian will be 


2 0 1 
n= [4 7 J 


and find In(H(x1)) = (2, 1,0) since the eigenvalues of H (xı) are —3, 3, and 
1. Thus H (xı) is indefinite and so x, is a saddle point of f. 


9.6. Bilinear forms 387 


Problem 9.10 For each of the following functions, determine whether the given 
critical point corresponds to a local minimum, local maximum, or saddle point: 
(1) f(z, y) = 3a —ayt+y? at (0, 0); 
(2) f(a, y, z) =z? + zyz +y? — 32 at (1, 0, 0). 


9.6 Bilinear forms 


We have seen that the inertia characterizes the geometry of the quadratic 
forms which is actually a special form of a bilinear form, and the theory of 
Euclid geometry is based on one particular positive definite quadratic form: 
the dot product, which is also a bilinear form that are going to be described 
in this chapter. Other quadratic forms abound in the mathematical zoo, as 
well as in mechanics and physics. Dieudonné has said that there is hardly a 
mathematical theory that does not involve a bilinear form: a few examples 
are 

- in analysis, Hilbert and Sobolev spaces, 

- in arithmetic, the decomposition of integers into sums of squares, 

- in differential geometry, Riemannian metrics and Lorentzian metrics, 

- in mechanics, the whole machinery of symplectic geometry, as well as 

tensors, etc. 

In this chapter, we present some basic concepts of bilinear forms and 
prove the Sylvester’s law of inertia (Theorem 9.15), which says that the 
inertia of the diagonal matrices are independent of the diagonalizations by 
congruence. We also introduce some physical applications to relativity and 
mechanics. 


Definition 9.5 Let V and W be two complex vector spaces. A sesquil- 
inrear form is a complex valued function b: Vx W > C on V x W 
satisfying 

(1) b(kx + &x', y) =k b(x, y) + £0(x', y) 

(2) b(x, ky + £y’) =k b(x, y) Fl b(x', y’), 


for any x, x’ € V, y, y’ € W and any complex scalars k, £. 
When V = W, a sesquilinrear form satisfying 


(3) b(x, y) =b(y, x), (or — ly, x)), 


is called a Hermitian form (or a skew-Hermitian form). 


388 Chapter 9. Quadratic Forms 


In the real case, the “sesquilinrear form” becomes “bilinear form” since 
the conjugation has no meaning, and # is replaced by ’ without conjugate 
notation, and so “Hermitian” is replaced by “symmetry”. 


Example 9.14 Let A be an m x n real matrix and let 6: R” x R —> R 
be defined by b(x, y) = x! Ay for x € R”, y € R°. Then b is clearly 
a bilinear form. When m = n, it is (skew-)symmetric if and only if A is 
(skew-)symmetric. Thus, a real quadratic form on R” may be obtained 
from a symmetric bilinear form. 


Example 9.15 In the above example, if A = I, the identity matrix, then 
b is the dot product on R”. In general, any inner product (x, y} on a real 
vector space V is a symmetric bilinear form on V. 

The converse is not true in general. But a symmetric bilinear form with 
”some special” condition can be an inner product. 


Example 9.16 In the plane R’, the determinant function 


a b 
act] e |=- 
for u = | $ | and v = i € R, is a skew-symmetric bilinear form, by 


the definition of the determinant. 


Problem 9.11 Show that a bilinear form b on R” is skew-symmetric if and only 
if b(x, x) = 0 for all x € R”. 


Example 9.17 For any m x n complex matrix A, the function b : C” x 
C” — C defined by b(x, y) = x# Ay for x € C”, y € Œ is clearly a 
sesquilinear form. When m = n, it is Hermitian if and only if A is Hermitian. 
In fact, b(x, y) = x" Ay is certainly quesilinear in the first variable, and 
y Ax = x4 Ay for all x, y € C” if and only if the matrix A is Hermitian. 

As before, when A = Jn, it is the dot product on C”. In general, any 
complex inner product (x, y) on a complex vector space is a Hermitian 
form. Conversely, a Hermitian form is an inner product on C” when it 
satisfies some” additional condition. 


In this way, an abstract definition of an inner product on a vector space 
may be given as a symmetric (or Hermitian) bilinear form with ”some” 
condition (see Corollary 9.12). 


9.6. Bilinear forms 389 


Let b: V x W — C be a sesquilinear form on V x W, and let a = 
{V1, V2, ---; Vm} and 8 = {wi, wo, ..., Wn} be bases for V and W, 
respectively. Such a bilinear form is completely determined by the values 
b(vi, wj) of the vectors v; € a, wj € p. In fact, if 


x 


TIVI T £2VQ2 T ` T LmVm; 
Y = WL T. yaw. Tes m YnWn 


are vectors in V and W, then 


b(x, y) = >> tiyjb(vi, wy) = [xf Aly]s, 
iat 


where A = [aij], aij = (vi, w;). A is called the matrix representation 
of b with respect to the basis a and 8, written by A = [6]. 

Let a’ and p’ be another bases for V and W. Let P = [id]®%, be the 
transition matrix from a’ to a in V, and Q = [id] be the transition matrix 
from 8’ to B in W. Then, for any x € V and y € W, we get 


Klo = [id] iie = Pia [yle = idly = Ry] 
W(x, y) = [x]¥ Aly]; = xl (P“ AQ)[y]s-. 
and so [b], = P” [b]3Q. In particular, if V = W, a = 6 and a! = fi, then 


[bla = Q” blaQ. 


This also shows that two matrix representations of a sesquilinear form b with 
respect to different bases are congruent, and conversely any two congruent 
matrices can be matrix representations of the same sesquilinear form. 


One can easily see that a sesquilinear form is Hermitian (or skew-Hermitian) 
if and only if its matrix representation is Hermitian (or skew-Hermitian) for 
any basis. 

The following theorem shows how a quadratic form and a bilinear form 
are related. 


Theorem 9.6 Jf b is a Hermitian form on C”, then the function q(x) = 
b(x, x), for x € Œ, is a complex quadratic form. 

Conversely, for every complex quadratic form q, there is a unique Her- 
mitian form b such that q(x) = b(x, x) for all x in Œ. 


390 Chapter 9. Quadratic Forms 


Proof: Again we can write b as b(x, y) = x” Ay with Hermitian matrix 
A. Thus q(x) = b(x, x) = x” Ax clearly defines a Hermitian form. 

Conversely, let q(x) = x! Ax be a Hermitian form with a Hermitian 
matrix A. Then, for any x, y € C”, 


b(x, y) =x" Ay 


clearly defines a Hermitian form such that q(x) = 6(x, x) for all x in C”. 
For the uniqueness, let 6’ be another Hermitian form such that q(x) = 
b' (x, x) for all x in C”. Then, from the polarity of 0’: 


U(x+y, x+y) = U(x, x) T b'(y, y) +0b'(x, y) + U(x, y), 


b(x-y, x-y) = U(x, x) +b (y, y)-b'(x, y) -b (x, y), 


we have, by a direct computation, 


b'(x, y) = -al(x+y)-4(x-y)]Ṣ4 glalix +y) -lix — y)] 
= x!Ay = d(x, y). 


In the real case, since b'(x, y) = b'(x, y), the polarity of b' becomes: 
b (x+y, x+y) =0(x, x) +20'(x, y) +b'(y, y), 

and so the uniqueness is obtained 

1 1 T 

b (x, y) = 5l(x +y) — a(x) —a(y)] = x Ay = b(x, y), 
for a symmetric bilinear form b' such that q(x) = b'(x, x) for all x in R°. 
Lemma 9.7 The set L(V?;F) of all bilinear forms on a vector space V of 
dimension n is a vector space of dimension n?. The set SL(V?;F) of all 
the symmetric bilinear forms, and the set AL(V?; F) of all skew-symmetric 


bilinear forms are subspaces of L(V?; F) of dimension a ae and n(n) 
respectively. 


Proof: Let a = {v1, v2, ..., Vn} bea basis for V, and a* = {v!, v7, ..., v”} 
its dual basis for V*. Define 


view iVxVo F, by v’ -vi(x, y) = v'(x)v/ (y). 


9.7. Duality 391 


Then one can easily show that {v’- vi | i,j =1,...,n} are linearly inde- 
pendent, and for any bilinear form b € L(V?; F), 


b= X byv’ -v/, where bij = b(vi, vj). 


On the other hands, let us define 


4a fa, be Bod 
v gvi: VxV >F, by V OVS v ev] 


i D eia e oe: 
vavi: V xV >F, by VAN E Woo as ae) 


Then, clearly, vê @ v? = vÍ @ v’, called the tensor product, is symmetric, 
and vê Av! = —v) Av’, called the exterior product or wedge product, 
is skew symmetric bilinear forms. Then {vf @ vi | 1 <i <j <n} and 
{vi Avi | 1 <i < j < n} form a basis for SL(V?; F) and AL(V?;F), 
respectively. In fact, it is clear that they are linear independent sets. For a 
symmetric bilinear form g, we have gij = g(vi, Vj) = g(Vj, Vi) = gji, and 


g= Lav" vi = Yow V > 9ijV" hy = Yaw Sv 420 gyvi gvi. 


tJ i<j 


For a skew-symmetric bilinear form h, we have hj; = h(vi, vj) = —h(vj, vi) = 


—hji, and 


n n 
h= S hgv .vi = X 2hyv’ Avi, 
i,j 


i<j 


9.7 Duality 


Note that a Hermitian form may also have the same names in Definition 9.4 
depending on the corresponding quadratic form, and congruent matrices 


have the same rank because the transition matrix P is nonsingular and so 
i pH 
is P™. 


Definition 9.6 The rank of a Hermitian form b on a vector space V, writ- 
ten rank(b), is defined as the rank of any matrix representation of b. 


392 Chapter 9. Quadratic Forms 


Example 9.18 Let b: R? x R? > R be defined by b(x, y) = 71y1 4+: 341y2+ 
2x2y1 — T242 with respect to the standard basis a = {e1, eg}. Then, b is 
clearly a bilinear form, and the matrix representation of b with respect to a 


i pear 


which is not symmetric. If 8 = {vı = (1, 0), ve = (1, 1)} is another basis 
for IR’, then the matrix representation of b with respect to 8 becomes 


because b(v1, vi) = 1, b(vi, ve) = 4, b(ve, vi) = 3 and b(v2, ve) = 5. 
Since rank[b], = rank|b]g = 2, the rank of b is 2. 


Problem 9.12 (1) Let b: R? x RÈ — R be defined by b(x, y) = 21y1 — 2%, y2 + 
L2y1 — £3y3 with respect to the standard basis. Is this a bilinear form? If so, find 
the matrix representation of b with respect to the basis 

a= {vı = (1, 0, 1), v2 = (1, 0, —1), v3 = (0, 1, 0)}. 


Find its rank. 

(2) Let V = Məx2(R) be the vector space of 2 x 2 matrices, and let b : V x V —> 
R be defined by b(A, B) = tr(A) : tr(B). Is this a bilinear form? If so, find the 
matrix representation of b with respect to the basis 


asma op 2=[0 0] 2=(1 0] 2=[0 a]l} 


Find its rank. 


Definition 9.7 A bilinear form b : V x W — F on vector spaces V and W 
is said to be non-degenerate if it satisfies 


b(v, w)=0 foralweW implies v = 0, and 
b(v, w)=0 foalveV implies w =0. 


Note that two vectors v and w are said to be orthogonal, denoted by 
v L w, if b(v, w) = 0. Since b(0, w) = b(v, 0) = 0 for any v € V 
and w € W, O is orthogonal to every vector, and so the non-degeneracy 
condition asserts that 0 is the only vector orthogonal to every vector. 

A bilinear form 6 on V x W naturally induces a linear transformation as 
follows: Let b: V x W — F bea bilinear form. For a fixed v € V, we define 
by: W > F by 


9.7. Duality 393 


Then the bilinearity of b proves that b% € W*, and 
b: V +W* defined by b*(v) = b% 


is a a linear transformation. Similarly, we can have a linear transformation 


b: W > V* defined by 


bw(v) = b(v, w) forveV and w €W. 


Conversely, any linear transformation T : V — W* defines a bilinear 
form b: V x W — R uniquely by the formula 


b(v, w) = T(v)(w). 


Theorem 9.8 A bilinear form b: V x W > F is a non-degenerate if and 
only if the linear transformations b* : V —> W* andb: W —> V* are 
isomorphisms. 


Proof: Suppose that b is non-degenerate and bf = bž,. Then, for all w € W, 
b(v, w) = bi (w) = dy, (w) = b(v, w) or b(v—v',w) =0. 


The non-degeneracy of b implies that v = v’, that is, b* is one-to-one. This 
also implies that 
dim V < dim W*. 


A similar argument shows that the linear transformation b : W — V* is also 
one-to-one, and therefore 


dim W < dim V*. 
Since dim W = dimW* and dim V = dim V* from Corollary 3.22, we have 
dim W < dim V* = dim V < dim W* = dim W. 


Therefore, b* and b are surjective, and so are isomorphisms. 
The converse is the definition of the non-degeneracy. 


Corollary 9.9 If there exists a non-degenerate bilinear form b: VxW > F, 
then dim V = dim W. 


Theorem 9.10 A bilinear formb:V xW — F on V xW is non-degenerate 
if and only if rank(b) = dim V = dim W. 


394 Chapter 9. Quadratic Forms 


Proof: Since every n-dimensional vector space V is isomorphic to F” and 
congruent matrices have the same rank, we can assume that V = F” and 
A = fb]a is the matrix representation of a bilinear form b : F” x F” — F with 
respect to the standard basis a = {e1, €2, ..., En} for F”. Then we have 
b(u, v) = ul Av for any u, v € F”. 

Suppose that rank(b) = rank A < n. Then the homogeneous system 
Ax = 0 has a nontrivial solution, say v, and then b(u, v) = uf Av = 0 for 
any u € F”, but v Æ 0. It implies that b is degenerate. 

Now, suppose that rank(b) = rank A = n and b(u, v) = uf Av = 0 
for all u € R”. By taking the basic vectors e1, €2, ..., €n instead of u, 
we can see that Av = 0. The condition rank A = n implies that v = 0. 
Similarly, the equation b(u, v) = uf Av = (uf Av)! = vf ATu verifies that 
b(u, v) = 0 for all v € F” implies u = 0. Hence, b is non-degenerate. 


Since no definite matrix has zero eigenvalue, we have: 


Corollary 9.11 A bilinear form b: V x V — F on a vector space V is 
non-degenerate if and only if the matriz representation A of b is definite 
(either positive, negative or indefinite). 


Corollary 9.12 An inner product on a vector space V is a positive definite 
Hermitian form. 


In the above duality, if we take V = W, we have the canonical duality 
of V: A bilinear form b: V x V > F on a vector space V defines a linear 
transformation 


b*: V > V*, b*(v)=v* =d(v, —) € V* =L(V;F), for all v €V, 


which is called a canonical linear transformation by b, since it is defined 
independent of the choice of a basis for V. When b is non-degenerate, b* is 
a canonical isomorphism (compare this with Definition 3.10). 

Conversely, if V* is the dual space of V, then one can easily verify that 
the function b = (,): V x V* > F given by 


(v,v*) = b(v, v*) =v*(v) = v(vř), for any v E V, vř* € V*, 
defines a non-degenerate bilinear form on V x V*, called the canonical 


pairing. Hence, we obtain the canonical isomorphism b* : V > (V*)* = 
Vr: 


9.7. Duality 395 


Let b be a bilinear form on a vector space V. For any subspace U of V, 
the restriction b|yxu, denoted by bļ|y, is again bilinear form on U, and we 
can consider U* as a subspace of V*. 

Let b: V xV — R bea non-degenerate bilinear form on a vector space V, 
and b* : V + V*, denoted by b*(v) = v* € V*, its canonical isomorphism. 
For a subspace U of V, the orthogonal complement of U is defined by 


U+={v €V | b(v, u) =0 for all u € U}. 
Clearly, UŁ is a subspace of V. Moreover, this is isomorphic under b* to 
Ut={fev*| f(u) =0, VuedU}, 


which is a subspace of V*: For f € U*, there is v € V such that b*(v) = f. 
Since 0 = f(u) = b*(v)(u) = b(v, u) for all u € U, we have v € UŁ. 
Conversely, if v € U+, clearly b*(v) = f € Ut. 

For any basis {e1,...,e,} for U, extend it to a basis {e1, ..., €k, x41, 
..-, €n} for V. Let e° : V > R, for i, j =1,...,n, be the dual basis for V* 
defined by 

e' (ej) = dij. 

Then clearly {e*+!,...,e"} form a basis for Ut: For j = k + 1,...,n, 
clearly, ef € U+ and linearly independent. If f € U*, set f(ej) = fj, 
and then f; = 0 for i = 1,...,k, and f = X j=k41 fjet. Moreover, {vj = 
(b*)-1(e/) | j =k+1,...,n} form a basis for U+ since b* is an isomorphism. 
Thus, we can identify U+ and Ut, and so dim U + dim U+ = dim V. Note 
that, v € Ut © b*(v)(u) = b(v, u) = 0 for all u € U, & b*(v)|u = 0, & 
v € ker b*|y for b*|y € U*. Thus Ut = ker b* |y. 

Note that, even if a bilinear form b on V is non-degenerate, the restric- 
tions of b to non-trivial subspaces can be degenerate or zero. 


Definition 9.8 We call U a non-degenerate subspace if the restriction bly 
of b to U is non-degenerate, and isotropic if b|y is zero. 


For example, if b is a skew-symmetric bilinear form, then for any v € V, 
b(v,v) = 0, which means any 1-dimensional subspace of V is isotropic. 
Moreover, if U is an isotropic subspace, then U C UŁ. 


Lemma 9.13 LetU be a subspace of a vector space V with a non-degenerate 
bilinear form b. 

(1) If the subspace U is non-degenerate, then V =U @Ut. 

(2) If both U and U+ are non-degenerate, then (U+)+ =U. 


396 Chapter 9. Quadratic Forms 


Proof: (1) We only need to show that UM U+ = {0}. However, if u € 
U NUŁ, then b(v, u) = 0 for all v € U. Since U is non-degenerate, u = 0. 
(2) Clearly, U C (U+)+. If U and U+ are both non-degenerate, then 


dim(U+)+ = dim V — dim U+ = dim U. 


A bilinear form b on an inner product complex vector space V can also be 
expressed by a linear transformation in terms of the inner product as follows: 
For an inner product complex vector space (V,( , )) with an orthonormal 
basis @ = {v1,...,Vn}, we identify V with C” via ®(v;) = e;. Then any 
sesquilinear form b on a vector space V can be written as 


d(x, y) =x" Ay 
in C”, where A = [b]a with aj; = [A]; = b(vi, vj) and x = [x],. But, 


x- (Ay) = (x,Ty) 
= (A”x)"y = A"x-y = (T*x,y), 


= 
Ps 
Z 
II 
P4 
z 
D 
«a 
| 


where T : V > V isa linear transformation on V corresponding to A via the 
identification ®(v;) = e; (see Theorem 3.11): i.e., T(v;) = 0; aijvi. Hence 
the matrix representation A of b maybe considered as a linear transformation 
on C”, and A” as its dual transformation on C”* (see Thorem 3.25 and 
Example 3.27). This means that there is a linear transformation T : V > V 
associated with b such that 


b(x, y) = (x, Ty) = (I"x,y). 
Therefore, for a (skew-)Hermitian form b, we have 
W(x, y) = (x, Ty) = + (Tx, y). 


This interpretation of a sesquilinear form as a linear transformation through 
an inner product ( , ) plays important roles in this context. 


9.8 Hermitian forms 


Definition 9.9 A bilinear form b on V is diagonalizable if there exists a 
basis a for V such that the matrix representation [b], of b with respect to 
a is diagonal. That is, b(v;, vj) = 0 when i Æ j for any basis vectors vj, vj 
in a. 


9.8. Hermitian forms 397 


Theorem 9.14 A sesquilinear form b on a vector space V is Hermitian if 
and only if it is diagonalizable by congruence. 


Proof: Since every Hermitian matrix is unitarily diagonalizable, we only 
need to prove the sufficiency. Let a bilinear form b be diagonalizable by 
congruence and let œ be a basis for V such that the matrix representation 
[b]a is diagonal. Then, for any basis 8 for V, the matrix representation [b]q 
and [b]g are congruent, say [b] = P#[b]aP for some invertible matrix P. 
Since [b]_ is diagonal, we have 


[ls = (P"[blaP)" = P“ [b]aP = (bls, 


i.e., the matrix representation [b]g is Hermitian for any basis 8 and so the 
bilinear form 6 is Hermitian. 


Example 9.19 Let b: R x R? — R be the bilinear form defined by 
b(x, y) = x1y3 — 2xoyo + 2xoy3 + 3Y1 + 2T3Y2 — Kays. 


Clearly, b(x, y) = b(y, x), and the matrix representation of b with respect 
to the standard basis a = {e1, e2, e3} is 


0 0 1] 
ol") 0 -2 2 
1 


which is symmetric. Hence, the bilinear form b is symmetric. By Theo- 
rem 9.14, it is diagonalizable through the congruence. In fact, 


0 0 1|100 
[elt] = | 0 -2 010 
1 2 —1|0 0 1 
-1 0 0ļ|0 0 1 
= 02 0ļ0 1 -1| = [D| Pl. 
0 0 —-1|1 2 -1 


By a direct computation, one can easily show that PT [blaP =D. 


Theorem 9.15 (Sylvester’s law of inertia) Let b be a Hermitian form 
on a vector space V. Then, the number of positive diagonal entries and the 
number of negative diagonal entries of any diagonal representation of b are 
both independent of the diagonal representation. 


398 Chapter 9. Quadratic Forms 


Proof: Let 6 be a Hermitian bilinear form on a vector space V and let 
a = {X1, ..., Xp, Xp4i, ---, Xn} be an ordered basis for V for which the 
matrix representation of b is diagonal: i.e.,b(x;, xj) = 0 for i A j, and 


b(Xi, 3) SO" fori=1, 2, ..., p, and 

b(xi, xi) <0 fori=p+1, ..., Nn, 
and let 6 = {y1, ..-, Yp's Yp'+1; --»» Yn} be another ordered basis for V 
for which the matrix representation of b is another diagonal, and 

b(yi, yi) >0 fori=1, 2; ..., p', and 

b(yi, yi) <0 fori=p +1, ..., Nn. 
To show p = p', let U and W be subspaces of V spanned by {xX1, ..., Xp} 
and {Yp +1; ---, Yn}, respectively. Then, b(u, u) > 0 for any nonzero vector 


u € U and b(w, w) < 0 for any nonzero vector w € W by the diagonality 
of b in both a and 8. Thus, U NW = {0}, and 


dim(U + W) = dim U + dim W — dim(UNW) =p+ (n—p') <n, 


or p < p'. Similarly, one can show p’ < p to conclude p = p'. Therefore, any 
two diagonal matrix representations of b have the same number of positive 
diagonal entries. By considering the bilinear form —b instead of b, one can 
also have that any two diagonal matrix representations of b have the same 
number of negative diagonal entries. 


Corollary 9.16 Any two Hermitian matrices which are congruent have the 
same inertia. 


Corollary 9.16 indeed asserts that the signs of the eigenvalues of a 
Hermitian matrix match the signs of the pivots, since the eigenvalues 
are the diagonal entries of the unitary diagonalization, while the pivots are 
that of forward elimination diagonalization. 


Definition 9.10 Let b be a Hermitian form with In(b) = (p,q,k). The 
number q of negative eigenvalues of A is called the index of b. The difference 
p—q between the number of positive eigenvalues and the number of negative 
eigenvalues of b is called the signature of b. 


Hence, the index and signature together with the rank of a symmetric or 
a Hermitian matrix are invariants under the congruence relation, and any 


9.8. Hermitian forms 399 


two of these invariants determine the third: that is, 


the index = the number of negative eigenvalues = q, 
the signature = the number of positive eigenvalues — the index = p — q, 
the rank = the number of positive eigenvalues + the index = p+ q. 


We have shown the necessary condition of the following corollary. 


Corollary 9.17 Two Hermitian matrices are congruent if and only if they 
have the same invariants In( A) = (p,q, k): index, signature and rank. 


Proof: Suppose that A = P#DP and B = QU EQ, where D and E are 
diagonal matrices with the same index q and signature p— q whose diagonal 
entries are in the order of positive, negative and zero. Let d; denote the 
i-th diagonal entry of D. Define the diagonal matrix R whose t-th diagonal 
entry r; is given by 


1/Jd; ifl<i<p 
r= << 1/V-d; ifp<i<r 


1 ifr<i<n. 
Then, 
Ļ 0 0 
Re DR = | 0 Sle 0 | = Jy. 
0 0 0 | 


Hence, A is congruent to Jpg, and similarly so is B. It concludes that A is 
congruent to B. 


Thus, by a suitable choice of basis {€1,...,€p,@p41,---,€p+q; €p+q+1; 
.. +, €n}, and its dual basis 


1 P P+! P+ eP+ta+1 n 
{e’,...,e7,e7"",...,e? e eee h 


a symmetric form may be expressed as 


k P q 


b= 5 TS eP tiger tio >D ePtIti ger tati = Sery y. 


t=1 t=1 i=l i=l t= 1 


400 Chapter 9. Quadratic Forms 


Example 9.20 Determine the index, the signature and the rank for each 
of the following matrices. 


112] 100] 101] 
A=|1 03|, B=/0 40}, C=]01 2]. 
23 6| 005] 121] 


Which are congruent each other? 


Solution: As Example 9.7, we can show that the matrix A and C are 
congruent to the diagonal matrices 


1 00] 1 0 0] 
Dalt S00 |g Ss 0S YOAbs 
0 03] ie 


and the matrix B is already diagonal. Since In(A) = (2,1,0), In(B) = 
(3, 0,0), and In(C) = (2,1,0), we conclude that A and C are congruent and 
B is congruent to neither A nor C by Corollary 9.17. (Note that it is not 
necessary to find the eigenvalue of C to determine its invariants.) 


Problem 9.13 Prove that if the diagonal entries of a diagonal matrix are per- 
muted, then the resulting diagonal matrix is congruent to the original one. 


Problem 9.14 Prove that the total number of distinct equivalence classes of con- 


gruent n x n real symmetric matrices is equal to t(n +1)(n +2). 


Problem 9.15 Find the signature, the index and the rank of each of the following 
matrices. 


0 12 12 3 1 0 1 
ofi -2 Ji afi] afore] 
2 3 4 3.5 6 12 1 


Which are congruent each other? 


9.9 The eigenvalues of Hermitian forms 


In physical world, many natural laws can be expressed as minimum princi- 
ple: the minimization of the total energy. Those energies can be expressed 
by positive definite quadratic forms, whose derivative is linear. Hence mini- 
mization leads us to the linear equation, when we set the first derivatives to 
zero. This optimization method may be used to estimate the eigenvalues of 
a Hermitian matrix as the solution of a series of optimization problems. 


9.9. Computation of the eigenvalues 401 


Theorem 9.18 The paraboloid F(x) = 5x! Ax—x!b for a positive definite 
symmetric matriz A takes its minimum Fmin = —5b’A~'b at the solution 
Xo of Ax =b. 


Proof: Suppose that xg is a solution of Ax = b. Then for any x 
1 1 
F(x) —F(xo) = zx Ax -xb — 5%0 Axo +x b 
1 1 
= 5x Ax = x! Axo + 3*0 Axo 
= ae — xo) A(x — xX) > 0, 


2 


since A is positive definite. It is zero only when x = xo. 


=f a] [y te [5], 


Solution: The minimum is at x = A~!b, where 


Fin = 5(A“'b)! 4(Ab) —(A™'b)’b = — 5b? Ab. 


Example 9.21 Minimize 


F(x) = 2? -ay+y?—-—axr—by = 


N| = 


In the physical terms, 5x! Ax is the internal energy and —x’b is the external 
work., and the system automatically goes to the equilibrium point where the 
total energy F is minimum. 

In fact, by calculus, the minimum is where the derivative is zero: 


Fr |_| 2r—-y-a | 8 2 -1 x a} |0 
PoE Ea [5 ]-[5] Lo], 
As the zeros of the characteristic polynomial, the computations of the 

eigenvalues in Ax = Ax are usually not easy, and we discussed some practical 
methods for the computation of the eigenvalues in Section 7.8. However, for 
a Hermitian matrix A, Theorem 9.19 gives very useful information about the 


eigenvalues, as the solutions of a maximization and minimization problem 
of a quadratic form. 


Definition 9.11 The Rayleigh quotient of a Hermitian matrix A is the 
function Ra defined by 


for x £0. 


402 Chapter 9. Quadratic Forms 


Theorem 9.19 (Rayleigh-Ritz Theorem) Let A be a Hermitian ma- 
trix, and let the eigenvalues of A be Amin = At < AQ < +++ < An = Amax iN 
increasing order. Then, 


(1) Amin < RA(X) <Amax for alt x. 


(2) Ra(x) = à if x is an eigenvector of A belonging to an eigenvalue X. 
In particular, Amin = min R(x) and max RA (X) = Amax- 


Proof: Write A = UDU”, where U = |u; --- uj] is the unitary matrix 
consisting of orthonormal eigenvectors of A, and D is the diagonal matrix 
with à1, À2,..., An on the diagonal. Moreover, by a change of coordinates 
U#x = y = [y1 yo -> Yn)’, we have 

x” Ax = y" Dy = àill” + Azlye|? +- + Anlynl?, 


and ||x|| = ||y|| because U is unitary. It implies that 


Allyl? + lya? +++ lyn?) < x7 AX = Arlyn? + Aaly? +--+ Andyn? 

An (Iyal? + [yal + +++ + lunt’) 

Anllyl|? = Amax |||? 

since Ay = Amin and An = Amax are the smallest and the largest eigenvalues. 
(2) If x is an eigenvector of A belonging to A, then 


lA 


x! Ax = àx” x = Al|x||?. 


In particular, if x is an eigenvector of A belonging to Amax (or Amin; respec- 
tively), then R4(xX) = Amax (or Amin, respectively). 


In particular, by taking x = e;, all the diagonal entries of A are in 
between Amin and Amax-: 


Problem 9.16 Find the maximum and the minimum values of the Rayleigh quo- 


tient of each of 
2) A= —3 0 1 
, ( ) : 


Problem 9.17 Find the maximum and minimum values of the quadratic form 
277 + 2x5 + 32122 


COON re 
ork OO 
=. OC oe 


subject to the constraint £? + z2 = 1, and determine values of x; and x2 at which 
the maximum and minimum occur. 


9.9. Computation of the eigenvalues 403 


Problem 9.18 Find the maximum and minimum of the following quadratic forms 
subject to the constraint z? + 23 + 23 = 1 and determine the values of x1, a2, and 
z3 at which the maximum and minimum occur: 


(1) £? + a3 + 203 — 2am. + 40123 + 42023, 
(2) 2a? + 03 + 02 + 2a 73 + 22129. 


Example 9.22 Find the maximum and the minimum of the quadratic form: 


De gre 
T] +25 +4229. 


Solution: The quadratic form can be written as 
oe) 2 T 1 2 Ly 
f(x) = xÍ + x5 + 42122 = X Ax= | zı z2 | : 


Let x = (cos 0,sin0) be a unit vector in R?. Then the above function can 
be written as 


f(0) = 1? + 23 + 4212 = 1 + 4cosOsind, 
which takes extrema at the solution 0 of the equation: 


f' (0) = 4(cos? 6 — sin? 0) = 4(cos 6 — sin 8) (cos 6 + sin@) = 0. 


Figure 9.7: extreme of the quadratic form 


: 5 3x 7 1 1l 
The solutions are 6; = 7, F and 62 = 7, 7. At 01, X1 = £ (<5. 4) 


= (<5. -4) Moreover, f(01) = 3 and f (02) = —1. Thus, 


the eigenvalues are 3 and —1 with the unit eigenvectors x; and x2. 


and at 02, X2 = 4 


404 Chapter 9. Quadratic Forms 


Example 9.23 Consider a symmetric matrix: 


—1 0 0 

1 1.2 1.6 
—1.2 1.72 0.96 
—1.6 0.96 2.28 


oor bb 


By choosing any vector, called a trial vector, say x = (0,0,0,1), we get 
its Rayleigh quotient: R4(x) = 2.28, and so Amin < 2.28 < Amax. If we take 
x = (1,1,1,1), then R4(x) = —0.68. Thus, 


Amin < —0.68, 2.28 < Amax. 


To improve this, we take any two vectors u and v in general, and compute 
Ra(x:z) for x = u + tv: 


u# Au + 2u” Avt + v” Avt? a+ bt + ct? 
Ra(xt) = — Fs = FOL). 


u¥ u + 2u” vt + vive? p+qt+rt? 


Then we get 
Amin < min Q(t), max Q(t) < Amax- 


If we take u = (0,0,0,1) and v = (0,0, 1,0), then 


1.72t? + 1.92t + 2.28 
Q ea a A 
( ) 1 #2 ’ 


and so Q'(t) = 0 at t = 0.75 or —1.33 Since Q(0.75) = 3 and Q(—1.33) = 1, 
we have 3 < Amax, which is a considerable improvement. This is called the 
second-order method of Rayleigh-Ritz. 


This method can be extended by taking a linear combination of k vectors 
as the trial vector, and used to find all the other eigenvalues. ‘This follows 
from the following beautiful Theorem 9.20. For this we consider a spacial 
case: 

Let A = UDU! as before. If we restrict to the vectors x € C” that are 
orthogonal to u; in Theorem 9.19, then, since yı = u#x = 0 in y = U"x, 
we have 


n n n 
x” Ax = X dali? = >> Alvi? > A2 X lui? = Aallxll, 
va 1=2 i= 1. 


where the inequality becomes an equality only when x is an eigenvector 
belonging to Ag. Thus, 


9.9. Computation of the eigenvalues 405 


Similarly, for k = 2, 3, ..., n, (or for k = 1, 2, ..., n — 1), one can have 
min Ra(x) = àk, max RA(X) = àn-k- 
xLuj,...,up—}1 xLui,... Un—-k+1 


Unfortunately, these formulas require explicit knowledge of the eigenvectors, 
for which we usually have no information. But we can have more useful 
characterization of the eigenvalues by generalizing these formulas: Write 
A = UDU", and let z be a fixed vector in C”. Then, for x L z and 
x = Uy so that y L Uz, we have x! Ax = y” Dy = X; ; A;ļyil?. Thus, 
for ||x|| = |lyll = 1, 


n 
H H 2 
maxx” Ax = max Dy = max ral Yi 
xlz BELA X eee ily 
n 
> max b Aily:| 
y LU”z Zi 
yı = = Yn-2 = 0 
= max An—11Y¥n—1? + An|¥n? > An-1. 
y LU#z 


[Yn—1|? + lyn? = 1 


This holds for arbitrary z, and the two inequalities become equalities if and 
only if z is an eigenvector belonging to A, and x = u,_1. Hence, we have 


, x! Ax : 
min max —,— = min max R4(x) = An-1. 
zeEeC? xlz xX"X zEC” xlz 


This means that the second largest eigenvalue can be characterized as the 
mini-maximum value of the Rayleigh quotient R4(x). 

Intuitively, this has a natural geometrical interpretation. For a positive 
definite symmetric matrix A, x! Ax = c describes an ellipsoid with the 
eigenvectors as the principal axes, and so is the cross section of one lower 
dimension if it is cut by the plane orthogonal to z. The major axis of this 
cross section is not longer than A,,(A), and can not be shorter than A,_1(A), 
but it is equal to An_1(A) when z = up. 

This formula looks a little bit complicated than the previous one, but 
does not require any information of the eigenvectors of A. In general, by a 
simple adjustment of the indices in the above proof, we have: 


Theorem 9.20 (Minimax, Maximin principle) Let A be a Hermitian 
matriz with the eigenvalues A, < Ag < +--+ < Ap in increasing order. Then, 


406 Chapter 9. Quadratic Forms 


for1l<k<n, 
min max R,(x) = Ak; 
Z1y very Zn —k EC” xLzı, wey Zn—k 
max min Ra(x) = x. 
Zl, -3 Ze—-yEC” x1lay, ..., Zk—1 


Problem 9.19 Prove Theorem 9.20. 


Problem 9.20 Let A and the eigenvalues of A be as in Theorem 9.20, and Sk 
denote a subspace of dimension k. Show that, for 1 <k<n, 


i ax R = xX 
max min Ra(x) = Ag. 


Sn-k+1 cc XESn-k+1 
By taking k = n — 1, or k = 2, we get the following corollary: 
Corollary 9.21 Let Sə be any two dimensional subspace of C”. Then 


Amin = 41 < min Ra(x) <An-1, AWS SSN Ra(x) < Àn = Amax- 
xEO2 


xES2 


Example 9.24 In Example 9.23, take S2 to be the subspace spanned by u 
and v. Then we obtain 


Ay S13, A2 <3 <A. 


A simplest application of Theorem 9.20 is to the comparison of the eigen- 
values of A + B with those of A: 


Corollary 9.22 Let A and B be Hermitian matrices. Suppose that the 
eigenvalues A;(A), A4(B) and A;(A + B) are arranged in increasing order. 
For each k = 1, 2, ..., n, we have 


A + A1(B) < Ag(A + B) < AKAA (B). 


In fact, note that 


àk(A +B) = > poe ę am f R(a+B) (X) 
= min max (,4(x) + Rg(x)) 
Ziy sis Zn—k EC” XLZ1, -3 Zn—k 


< Ap (A) + àn (B). 


Similarly, for the lower bound. 
The equality is obtained for B = au;u¥ where u; is an eigenvector of A 
belonging to \;(A), and for a > 0, and for a < 0. 


9.10. Indefinite forms in special theory of relativity 407 


Corollary 9.23 Let A and B be Hermitian matrices. Suppose that B is 
positive semidefinite.the eigenvalues \;(A) and ;(A + B) are arranged in 
increasing order. For each k = 1, 2, ..., n, we have 


Akl A) < àk A+B). 


In particular, if the rank of B is 1, then between each successive pair of 
odd-numbered (or even-numbered) eigenvalues of A+ B there is at least one 
eigenvalue of A. 


9.10 Indefinite forms in special theory of relativity 


In Euclidean geometry, an inner product plays very important roles: It is 
a symmetric positive definite bilinear form. The magnitude of a vector is 
well-defined by the positive definiteness since it guarantees positivity of the 
lengths of nonzero vectors. In 19-th century, it was one of the greatest 
discovery that an indefinite symmetric bilinear form may also be used as an 
inner product, and it becomes a basic tool for the theory of relativity. In 
this section, we introduce how such an indefinite bilinear form incorporates 
with the special theory of relativity. 

Let us first compare it with the Newtonian physics: 

(1) A material particle is either at rest or in motion (strict dichotomy): 
that is, there are absolute rest, absolute motion, or absolute time so that 
one can tell whether any two events occur at the same time or in the same 
place. 

(2) Additive rule of velocities holds, and hence, arbitrarily large velocities 
can be made: (e.g., the light with speed c emitted from a rocket running at 
speed c/10 darts ahead at speed c + c/10, and thus, the speed of light plays 
no special role). 

Thus, the Newtonian space-time is modelled in the Euclidean space R! x 
R3 = Rt in which 


1. a point (t,x) € R! x E? represents an event in time t; 


2. instantaneous happenings at a particular time ¢ € R are all points in 
t x E3, which is orthogonal space to the time axis; 


3. a Newtonian particle is described by a world line or history (t,x(t)), 
which is described by a material particle x(t) € E’. 


408 Chapter 9. Quadratic Forms 


Therefore, from an event Py = (to, x(to)), Newtonian rockets can go to 
any distant event (t, x) in arbitrarily short time t—to except for simultaneous 
events in 3-space t = tọ: that is, the past and the future of Py = (to, x(to)) 
fill the whole space-time except for the 3-plane t = tọ. However, for over 
300 years it has been known that; 


(1) light travels at a very high but nonetheless finite speed c = 3 x 
10!° cm/sec, 


(2) no material object has been observed to travel faster, 


(3) the speed of light is constant regardless of the motion of a source or 
an observer (i.e., additive rule of velocity fails for light), 


(4) there is no way to determine whether two events occur at the same 
time or in the same place, 


(5) there is no way to determine whether a non-accelerating particle is 
moving or not (i.e., there is no notion of absolute speed for material 
particles. For instance, if a rocket is out in space far from any external 
influence and is not accelerating, then one cannot determine whether 
it is at rest or not). 


In short, the Newtonian physics treats light absolutely (1) when it should 
be treated relatively, and treats it relatively (2) when it should be treated 
absolutely. 

A natural approach to remedy the Newtonian difficulties would be to 
measure the position of the rocket relative to something that is at rest. 
But one by one the candidates failed: earth, sun, any “fixed” stars, or the 
conjectured ether. 


Many mathematicians and physicists, such as Lorentz, Poincaré, tried to 
resolve the serious difficulties in the classical Newtonian physics, centering 
around the properties of light, and made big progresses. The first compre- 
hensive solution was made by Einstein in 1905 with the publication of his 
special theory of relativity. Its mathematical essence was a novel way to 
change space and time coordinates. In 1908 Minkowski showed that these 
occur naturally if space R? and time R! merge in a single spacetime Rt. We 
now discuss how this could be achieved. 

The basic assumptions in the theory of relativity are: 


(1) the speed of light is constant regardless of the source, 
(2) no material particle travels faster than the light, 


(3) there is no preferred absolute stationary observer or moving observer. 


9.10. Indefinite forms 409 


To formulate this in a mathematical frame, we consider the light emitted 
in a |-dimensional ray with speed c, and take this as the unit in x-axis: i.e., 
l-unit in space direction is the distance that the light travels in unit time, 
and put the time axis perpendicular to the space x direction. ‘Then the world 
line (or history) of the light going to left or right observed by the stationary 
observer O at the origin is described as the doted cone in the following 
figure, and a point (t, x) denotes the space-time location of a particle in this 
space-time. 


Figure 9.8: 1 dimensional motion 


Now consider a train moving along a 1-dimensional railroad track (z- 
axis) in a constant speed v. Then the world line (or history) of the mid 
point of this train is described by the line L with slope steeper than that of 
light since v < c. Due to the assumption (2), the world line of any material 
object lies in the cone of the light. The line M is the world line of an object 
moving to the left of the observer at O. 

These lines are not actual paths of the train in space, nor a picture that 
the observer at the origin sees when he looked at the physical world around 
him, but it represents particular positions occupied by the mid point of the 
train at time ¢ in the spacetime. It might seem natural to regard locations 
on the x-axis as being “simultaneous” with the origin since they have the 
same time coordinates as the origin so that the world line of the observer O 
himself is the time axis itself. 

However, this notion of simultaneity is relative one, since there is no 
absolute motion but there is only motion relative to some reference point. 
Thus, an observer T on board the train is free to regard himself as being 
stationary. What is the space of simultaneity for this observer? 

When the train moves to the right of O, the observer T on board the 
train sees the observer O at the origin moving to the left. Thus if we set the 
locations in the world line of the train observer T to be the time axis in T’s 
diagram, then the locations in the history of the observer O are represented 


410 Chapter 9. Quadratic Forms 


in the train observer T’s diagram by a line sloping to the left. Just like the 
locations that O judges simultaneous at any particular time all lie along a 
line, the locations that are simultaneous for T at a given time should also lie 
along a straight line in T’s diagram, but maybe different from the horizontal: 
To determine this, let us consider the parallel world lines L, M, and N of 
the rear end, midpoint, and front end, respectively, of the train moving at 
constant speed to the right as in the following Figure 9.9: 


Figure 9.9: Simultaneous events to a moving observer. 


Suppose that, at the center of the train, which is at a space-time point P, 
a light source is switched on, and photons emitted in both directions along 
the train. By the assumption (1): the speed of light is constant regardless 
of the motions of the sources, the slope of the world line of the light must be 
the same in both T’s and O’s diagrams and so the world lines of photons in 
both T’s and O’s diagram will have slopes +1 and —1. These photons meet 
the ends of the train at space-time locations A and C, which clearly from 
O’s point of view are not simultaneous: While one photon was traveling to 
the rear of the train, the rear end was traveling toward it, hence shortening 
the time till they met. On the other hand, the front end was moving away 
from the other photon such that delaying their collision. 

However, the observer T in the train regards the train as being stationary, 
and he knows that the light source is positioned midway between the two 
ends. Thus, the two photons appear to travel the same distance at the same 
speed to reach either end of the train, and so must arrive there at the same 
time. This means that the two “events” A and C are judged by T to be 
simultaneous. 

It is now clear from the figure above that Zp = <0 since the triangle 
ABPC is an isosceles. 


Theorem 9.24 Let T be an observer moving at a constant speed, with a 
world line making an angle 0 to the vertical, and let P be any event in 


9.10. Indefinite forms 411 


the space-time. Then the observer T will consider any other event Q to be 
simultaneous with P if and only if the line PQ makes the angle O to the 
horizontal. 


That is to say, the slope of PQ must be reciprocal to that of T’s world 
line for P and Q to be simultaneous. Therefore, the faster the constant speed 
of an observer is, the steeper will be his line of simultaneous locations, or the 
closer to his world line. Ultimately, at the speed of the light, the two lines 
coincide so that, to an observer travelling at the speed of light, all locations 
in his own history would become simultaneous. 


T’ sworld line 
g wt P’ simultaneous world 


Figure 9.10: Orthogonal space to an observer. 


Therefore, it seems natural to regard the line PQ is orthogonal to the 
world line L of the observer T, which is regarded as the time axis for T. 
This notion is the keystone of the special relativity. It turns out that the 
above phenomenon are very well suited in the geometrical plane R? with the 
indefinite inner product: 


—1 0 
d(x, y) = —T0Y0 — 1141 = x! | 0 1 | y. 


This two dimensional geometry is called the Minkowski space-time. Note 
that this indefinite inner product is just an indefinite symmetric bilinear 
form on R?. Thus, in the Minkowki space-time, some vectors, like (1,0), 
have positive lengths, while some others, like (0, 1), have negative lengths. 
Also some nonzero vectors, like (1,1), have zero length. Moreover, in this 
space-time, two vectors, like (1,2) and (2,1) are orthogonal to each other, 
which can not be orthogonal in the usual Euclidean inner product. In fact, 
the special theory of relativity begins with the 4-space Ri = L* with the 
Minkowski inner product given by the indefinite symmetric matrix 


=f 


412 Chapter 9. Quadratic Forms 


9.10.1 Lorentzian space-time 


Slightly generalized space-times are known as the Lorentzian space-time 
after the physicist H. Lorentz: 


Definition 9.12 A Lorentzian space-time is a vector space V with an in- 
definite symmetric bilinear form g, called an inner product, of index 1. 


Recall that the index of g is the number of negative eigenvalues, which 
is equal to the dimension of a subspace W on which g|w is negative definite. 

Two vectors u and v are orthogonal, if g(u, v) = 0. In Figure 9.11, three 
pairs of vectors are orthogonal, z and z’: 

Since g is indefinite, the norm of a vector can be positive, negative, 
or even zero. Depending on the sign of the norms, we define the causal 
characters of vectors as follow: 

A vector v # 0 in V is space-like if g(v, v) > 0, time-like if g(v, v) < 0, 
and null if g(v,v) = 0. In Figure 9.11, u, v are space-like, u’, v’ are time- 
like, and w | w is light-like. 


u = (0, 1) v’ = (p,1) 
w =w’ = (1,1) 
VS (1, p) 
’ u = (1,0) 


Figure 9.11: Orthogonal vectors. 


As before, by a suitable choice of a basis {uo, u1, ..., Up—i} for V, one 
can have a diagonalized matrix expression of g by congruence relation as 


n-1 
g(v,v) = -dov + X div, 
i 
for any v = 5, viju; E€ V and g(u;, uj) = ejdjjd; with eg = —1 and 


ci = 1 for i > 1. This shows that all the null vectors with g(v,v) = 0 
form an elliptic cone, called the null-cone in V, all the time-like vectors are 
inside the null-cone, called the time-cone, and all the rest vectors are the 
space-like vectors. 


9.10. Indefinite forms 413 


Moreover, the unit space-like vectors with g(v,v) = 1 form one-sheet 
of hyperboloid, and the unit time-like vectors with g(v,v) = —1 form two- 
sheets of hyperboloid (see Figure 9.5). 

A subspace W of V is non-degenerate if the restriction g|w of g to the 
subspace W is non-degenerate. One can easily derive the following lemma: 


< | {ms -like vector 


null vector 
space-like vector 


Figure 9.12: Causal characters 


Lemma 9.25 Let V be a Lorentzian space-time. The following are equiva- 
lent: 


1 


2 is non-degenerate, 


(1) W is non-degenerate, 

(2) W 

(3) dim W + dim W+ =n = dim V, 
(4) 

(5) 


wt 


4) (W+) =W, 
5 V=WeWż+. 


Definition 9.13 For a subspace W C V, the causal character of W is 


(1) space-like if g|w is positive definite, 
(2) time-like if gly is nondegenerate of index 1, 


(3) light-like or null if gly is degenerate. 


Note that the causal character of a vector v € V is the same as that of 
the subspace Rv. The following lemmas characterize these subspaces. 


Lemma 9.26 Let W be a subspace with dim > 2 in a Lorentzian space-time 
V. The following are equivalent: 


(1) W is time-like, 


(2) W contains two linearly independent null vectors, 


414 Chapter 9. Quadratic Forms 


(3) W contains a time-like vector, 
(4) WŁ is space-like. 


Hence, z € V is time-like if and only if z+ is space-like, if and only if 
V = Rz zt. 


Figure 9.13: Causal characters of subspaces 


Lemma 9.27 The following are equivalent: 


(1) W CV is light-like, 

(2) W contains a light-like vector but not a time-like vector, 

(3) {09} CWnw!, 

(4) WŁ is light-like, 

(5) WNA=L—{0O}, where L is a one-dimensional subspace and A is the 
null-cone of V. 


Denote by T the set of all time-like vectors in V: i.e., T = {v EV 
g(v,v) < 0}. For a time-like vector u € 7 the time-cone containing u is 
denoted by 

C(u)={vEeT : g(u,v) <0}. 


Then C(—u) = {v ET : g(u,v) > 0} is the opposite time-cone, and 
T = C(u) Ù C(—u). 
The norm of a vector v in a Lorentzian space-time V is defined by 


Ivii = lv v. 


Lemma 9.28 Two time-like vectors v, w are in the same time-cone of u 
if and only if g(v,w) < 0. 


9.10. Indefinite forms 415 


Proof: Let v € C(u) and w is time-like. We may suppose u is a unit time- 
like vector. Write v = au + V and w = bu+w, where v, Ww € ut, and 
a,b E€ R. Then 


0 > g(v, v) = —a* + ||¥|? = |I¥I] < lal. 


Similarly, |wŵ|| < |b]. Then 0 > g(v,u) = —a = a> 0, and g(v,w) = 
—ab + g(v,w). But |g(v,w)| < |lv||||wl] < |ab|. Hence, sgn g(v,w) = 
sgn (—ab) = —sgn (b), and 

g(v,w) <0 = b>0 <> g(u,w) <0. 


Lemma 9.29 Letv, w be time-like vectors in V. Then 


(1) |g(v,w)| > ||v|||lwl|, equality holds if and only if v || w. 

(2) Ifv and w are in the same time-cone of V, then there exists a unique 
number y > 0, called the hyperbolic angle between v and w and 
denoted by p = Z(v,w), such that g(v,w) = —||v||||w]| cosh y. 


Proof: (1) Write w = av + Ŵ with we v+. Then 


0 > g(w,w) = a°g(v,v) +9(w,w), 
gv, w)? = a’g(v,v)* = (g(w, w) — g(w, W))g(v, v) 
> glw,w)g(v,v) = |Iv|l?llwll? 


The equality holds if and only if g(w, w) = 0 or w = av. 
(2) g(v,w) < 0 implies that —g(v,w) > ||v||||w||.. Thus there exists a 
unique number y > 0 such that 
g(v, w) 


1 < ——— = cosh y. 
Iv! | 


Corollary 9.30 If v,w are in the same time-cone, then the reversed trian- 
gular inequality holds: 


Ivl + [wll < [lv + wll. 
” =” holds if and only if v || w. 
Proof: By Lemma 9.29, —g(v,w) > ||v||||w||. Thus 


(Ilvll-llww ll)? = [Ivll?+2llvliilwll-+llwll? < —9(ivll-+ lhl, vw) = [v+w?. 


Equality holds if and only if —g(v, w) = ||v||||w||, if and only if v || w. 


416 Chapter 9. Quadratic Forms 


9.10.2 Minkowski space-time 


Definition 9.14 A Minkowski space-time is the vector space Rt with 
the Minkowski norm g(v, v) = —v2 +v? +3 +3 of index 1, denoted by Rt. 


A material particle in Ri is a time-like future-pointing regular curve 
a: I + RÉ such that g(a'(s),a’(s)) = —1 for all s € J. The parameter s 
is called the proper time of the particle, and the image a(J/) is called the 
world line of a. 

A light-like particle is a future-pointing null line y : I > Ri such that 
gly (s), y/(s)) = 0 for all s € I. 

However, the statement that the light moves in a straight line is a funda- 
mental hypothesis in relativity, and since g(7’,7’) = 0 for a light-like particle 
y, parameterization of y by a proper time is out of question. Being massless 
it cannot carry a clock, while each material particle comes equipped with a 
clock measuring its proper time. 

Let (2°, £t, £2, x3) denote the standard coordinates of Ri such that eo 
represents the future-pointing time orientation. For an event P € Rt the 
future time-cone of P is 


3 
[QER : PQ= X (Q) —x'(P))e; is time-like and future pointing}, 
i=0 


whose boundary is the light-cone of P. Thus, an event P can influence an 
event Q if and only if there is a particle from P to Q. 
For P,Q € RÉ with qj = ti (Q) and p; = xÍ (P), let 


we 


PQ = ||PQ\| = |—(a0 — po)? + 


J 


(qj —p;) > 0. 
1 


3 

(1) If PQ is time-like future pointing, then PQ is the elapsed proper time 
of the unique freely falling material particle from P to Q. 

(2) PQ is light-like <=> PQ = 0 <= there is a light-like particle 
through P to Q. 

(3) If PQ is space-like, the PQ > 0 is the distance from P to Q measured 
by any freely falling observer orthogonal to PQ. 


Lemma 9.31 If OP is space-like and OQ is time-like in Ri, then any two 
of the following imply the third: 


9.10. Indefinite forms 417 


(1) PQ is light-like, 
(2) OP L OQ, 
(3) OP = OQ. 


This follows from: OQ — OP = PQ and 
—OQ? — 2g(OQ, OP) + OP? = +PQ?. 


Theorem 9.32 For events P,Q in the same time-cone of O such that OP L 
PQ, 
(1) OQ? = OP? — PQ?. 
(2) OP = OQ cosh y, PQ = OQ sinhy, where p = ZPOQ is the hyper- 
bolic angle between OP and OQ. 


Figure 9.14: Hyperbolic angle. 


Proof: (1) From the picture above, OQ = OP + PQ yields —OQ? = 
—OP? + PQ?. 
(2) Clearly, (OP, OQ) = -OP OQ cosh y. On the other hand, 


g(OP, OQ) = g(OP, OP + PQ) = —OP?. 


Thus OP = OQ cosh y and PQ? = OQ? (cosh? y — 1) = OQ? sinh? ¢. 


9.10.3 Particles observed 


An observer in R$ is just a material particle, and a freely falling observer 
is an observer whose world line is a time-like line in Rt. Let w be a freely 
falling observer. Then by an isometry we may assume that the world line 
of w is the z? axis so that its proper time is t = x°w(t) and its rest- 


space of simultaneity are the slices z? = constant, which are isometric to 


418 Chapter 9. Quadratic Forms 


the Euclidean space Ey & R. Now, for each event P € Rt the number 
x°(P) = Pp is called the w-time of P and the point P= (Pi, P2, P3) € R is 
called the w-position of P. 

Let a: I + RÍ bea particle, either material or light-like (called causal). 
For each parameter value s € I (proper time if œ is material), t = ag(s) is 
w-time and (a;(s),a@2(s),a3(s)) € R is the w-position. Note that ag is 
a diffeomorphism of J onto some interval J C R, since @ is causal and 
doo) = —g(a’,e9) > 0. Let u :t E€ JH u(t) = s € I be the inverse. Then 
at w-time t € J the w-position of & is 


a(t) = (ayu(t), agu(t), agu(t)), 


which is called the the “w-Newtonian particle” of a: what the observer w 
observes a. 

The relationship between a and @ is a guide for the development of the 
special relativity. Note that dt/ds = d(ag)/ds > 0 and da/dt = one 


Lemma 9.33 Lety be a light-like particle in Rt. The associated w-Newtonian 
particle ¥ of y is a straight line in R? with speed 1. 


Proof: Since y is a geodesic, x’y(s) = ajs + bi, 0 < i < 3. Thus the 
projection 
F(s) = ("1 (s), y2(s), ¥3(s)) 


is a straight line. Moreover, since the vector 


where t = x? o 7(s), is null (i.e., I&II = 0) and dt/ds is positive, # = IZI. 


Thus, the speed of ¥ is 


d dt/ds 


a) _ ld3/ás| 
v= DQy = Sel 


In particular, the light has the same constant speed 1 relative to every 
freely falling observer. 


Theorem 9.34 Let &: J > R? be the w-Newtonian particle of a material 
particle a: I — Rt observed by x°. Then 


9.10. Indefinite forms 419 


(1) the speed of a relative to the observer w is v = ||d@/dt|| = tanhy, 
where y is the Lorentz angle between a! = da/ds and the instantaneous 
observer €o, and 

(2) the proper time s of a and its w-time t are related by 

dt dag(s) 1 
— = = cosh y = ——— > 1. 
ds ds j V1 —y2 ~ 
In particular, O < v < 1, and the faster the particle with proper time s 
is moving relative to w (i.e., the larger v is) the slower the particle’s clocks 

(s) runs relative to the observer’s clock (t). Therefore, moving clocks run 

slow. 


Figure 9.15: Speed of moving clocks. 


Proof: Since eg and a’ are unit time-like vectors, there is a unique Lorentz 
angle y > 0 such that 


dt d 
= cols) = —g(a’,e9) = cosh ọ > 1. 
-2 
Moreover, since —1 = g(a’, a’) = — (Ey + | ga and ọ > 0, it follows that 
da 
| ee | 4/ cosh? p — 1 = sinh ọ > 0, 
ds 
and hence, 
_ ida}, _ da/dt _ sinh _ ARR 
dt dt/ds cosh 


Thus, cosh y = 1/V1— v?. 


420 Chapter 9. Quadratic Forms 


9.10.4 Some relativistic effects 


Let wı and wz be two different observers, so that they are not parallel. 
Then they have different restspaces so that simultaneous events for w are 
not so for w2, and vice versa. If a and 8 are material particles, then their 
instantaneous relative speed is measured by v = tanh y for their hyperbolic 
angle y. In particular, if 6 is a freely falling material particle parallel to 
a freely falling observer w, then w regards § as being rest, while for other 
observers it can have arbitrary constant speeds 0 < v < 1, and two freely 
falling observers regard themselves as moving at constant relative speed 
tanh(Z(a’, 6’)). Therefore, only relative motion is defined, the light moves 
at speed 1 relative to every freely falling observer, and all material particles 
have relative speeds v < 1. Thus the difficulties with Newtonian motion do 
not arise here: the essential dichotomy is not between rest and motion, but 
between free fall and acceleration - for if 8” is not identically zero, no freely 
falling observer consider (6 to be at rest. 


Relativistic addition of velocities 


Consider a Lorentzian space-time R?. A rocketship p leaves a space 
station o (both freely falling) at relative speed vı > 0, and a spaceman p 
is ejected from p with constant speed və relative to p. Note vg > 0 means 
forward away from g, and v2 < 0 means backward or toward ø. What is the 
speed v of ps relative to o ? 

(1) The Newtonian answer is v = vı + v2, of course. 

(2) Einstein’s answer: In the following figure, p departs from o at the 
event p and u departs from p at q with vo > 0. Thus vı = tanhy, and 
vg = tanh y2. The additivity of hyperbolic angles implies y = Z(o', py’) = 
pı + yo. Therefore, 
tanhy; + tanhy. — vı + v2 


= tanh y = tanh = —— = c. 
£ pes anit) 1+ tanh y; tanh y2 1+ viv 


The twin paradox 


On their 21st birthday, Peter leaves his twin Paul behind on their freely 
falling spaceship and departs at constant relative speed v = 24/25 in free fall 
for 7 years of his proper time. Then he turns and comes back symmetrically 
in another 7 years. Upon his arrival he is thus 21 + 14 = 35 years old, but 
Paul is 71: From the above figure: 

7 


OX = OP cosh y = (1 — (24/25)2) 2 = 25. 


9.10. Indefinite forms 421 


Similarly, XQ = 25. That is, Paul’s age at Peter’s return is 21 + 2(25) = 71. 


Figure 9.16: Relativistic addition of velocities, and the twin paradox. 


Lorentz-Fitzgerald contraction 


Let w be a free observer with its restspace E, through w(0), and let a 
and £ be parallel freely falling particles in Ri, both at constant speed v 
relative to w. We may consider they are the end points of a freely falling 
rod £ = |&, p] in E,, so that their w-Newtonian particles @ and B move 
along parallel straight line in E,. Thus, w sees the rod as a line segment 
(a, Bl moving in translation, parallel to itself, in E,,. The length of the rod 
measured by w is the constant distance Ly from a(t) to G(t) in E,. Fora 
rider on the rod, say a, the rest length of the rod is measured to be La in 
his rest space Ey. We may assume that w(0) = a(0) = 0. 

(1) Suppose that the rod [@(0), 4(0)] in E,, moves orthogonal to its axis: 


> 


i.e., (@(0), 6(0)] L &(0). Then [&(0), 6(0)] CE, N Ea since w, a and & are 


Figure 9.17: Lorentz-Fitzgerald contraction (1). 


in the same plane and £ L a. This means that Lu = La for all t. 


422 Chapter 9. Quadratic Forms 


(2) Suppose that the rod [@(0), 8(0)] in E, moves in the direction of its 
axis: i.e., [@(0), 8(0)] and &(0) are in the same direction. Thus, w, a, 6, a, 
B are all in the same plane. 

Note that OP is the restlength La of the rod £ in Ea, and OQ = Lu. 
Moreover, from Theorem 9.32, Lu = OQ = RQ sinh y and OR = RQ cosh y, 
and since OP L RP, 


La = ORsinhy = RQ sinh Y cosh y = Ly cosh y. 


Therefore, Lyu = ats = Lav 1 — v?, since v = tanhy. 


Figure 9.18: Lorentz-Fitzgerald contraction (2). 


This phenomenon of shortening in the direction of motion is the cel- 
ebrated Lorentz-Fitzgerald contraction conjectured independently by 
Fitzgerald and Lorentz some years before Einstein’s 1905 special theory of 
relativity. 


Example 9.25 The station master w at a station observes a train a of rest 
length La = 200m at a constant relative speed v = V3/2 passes the station, 
and measures the train’s length as Lu = 200V1 — v? = 100m. 

For the conductor a on the train, the train is at rest and hence is mea- 
sured to be 200m, while the station is moving at the speed v, hence is 
measured to be Sa = 100V1 — v? = 50m. 

This looks conflicting measurements, but the following picture of the 
space-time shows they coexists harmoniously: The station at rest is [7,w] 
of distance 100m and the train is [a, 3]. Suppose the station master is at w 
and the conductor at a and they pass each other at the event Q. 

At the station time of event Q, the station master w sees the train as 
[P, Q] of length 100m in E,. 


9.11. Skew-symmetric forms in symplectic geometry 423 


Figure 9.19: Lorentz-Fitzgerald contraction (3). 


At the train time of event Q, the conductor a sees the train as |X, Q] of 
length 200m in Ea = Eg of which only the first 50m segment [Y, Q] is inside 
the station. 

Event R is the crossing of the station master w and the back $ of the 
train. At the speed v = V3/2, the 100m train passes him in elapsed time 
OP = QR = 200/V3. At the same relative speed the conductor is passed 
by the 50m station in elapsed time OQ = PR = 100/V3. 


9.11 Skew-symmetric forms in symplectic geome- 
try 


While positive definite, or indefinite, symmetric bilinear forms are building 
blocks for Euclidean geometry, or relativity, skew-symmetric bilinear forms, 
also called 2-forms on a vector space play also very important roles in 
mechanics or in physics. Actually, they are building blocks for the so-called 
symplectic geometry. 

Symplectic forms arise in a natural way in theoretical mechanics, in 
particular in the passage from classical to quantum mechanics. In order to 
motivate the study of symplectic geometry, we may begin with a relevant 
physics. But, then our main topic of linear algebra will digress too far. 
Those who are interested in this direction may find better references such 
as R. Berndt ([3]). 

In this section, we introduce the most basic properties of symplectic 
forms. Note that skew-Hermitian forms are usually not specially studied, be- 


424 Chapter 9. Quadratic Forms 


cause each of them is obtained from a Hermitian form g by ig (see page 286), 
so that the geometries of Hermitian and skew-Hermitian forms are essen- 
tially identical. On the other hand, the geometry of a skew-symmetric form 
differs in may ways from that of symmetric forms. 


Example 9.26 (1) Let V be a 1-dimensional vector space with a skew- 
symmetric form g. Since g(u,u) = —g(u, u), g(u,u) = 0 and so 


g(v, w) = g(au, bu) = abg(u, u) = 0, 


that is, g is identically zero. Thus, a skew-symmetric form on an 1-dimensional 
space is not interesting. 

(2) Let (V,g) be a 2-dimensional vector space with a skew-symmetric 
form g. If g is degenerate, then, by (1), g = 0. 

Suppose g is non-degenerate. Then there is a pair of vectors a = {e1, e2} 
such that g(e1, e2) =a #0. By normalizing, we may set a = 1. Then they 
are linearly independent, since if e2 = ae, then we have a contradiction: 


1 = g(e1, e2) = ag(e1, e1) = 0. 


For any u = ae, + beg and v = ce; + deg, 


Gr v) =ad—be= [al] °, Alea or (ilo =| © a 


In general, a skew-symmetric bilinear form is not diagonalizable, but the 
following theorem shows the structure of a symplectic form in general. 


Theorem 9.35 Let V be a vector space with a skew-symmetric form g. 
Then there exists a basis a for V with respect to which the matriz represen- 
tation [gla is of the form 


a 4 


9.11. Skew-symmetric forms 425 


Proof: If g = 0, then [g]q is the zero matrix. Suppose that g # 0 and 
prove it by induction on dimV > 2. As Example 9.26, there exist a pair of 
linearly independent vectors x and y in V such that g(x, y) = 1. Let U 
be the subspace of V spanned by x and y. Then U is non-degenerate, an 
so by Lemma 9.13, V = U @U+ and dimU+ = n — 2. Clearly, the matrix 
representation of the restriction gly with respect to the basis {x, y} is 
1 
| E 0 | , and the restriction g|y1 is also symplectic. The same argument 
can be applied to gly. on U+, and the theorem is proved by induction. 


Note that the above matrix is congruent by a permutation matrix to the 
following form: 


Corollary 9.36 Let V be a vector space with a skew-symmetric form g. 
Then the rank of g is even, and V is decomposed into a direct sum of pairwise 
orthogonal subspaces V;’s of dimension 2 and a subspace D on which g is 
identically zero: 
V=V,0::-8v, 8D. 
Theorem 9.35 guarantees the existence of a basis 

a = {e1,...,€43 Cx41,---; 2k} E2k+1; -3 en} 
for V such that 

g(ei, €k+i) = —g(ek+i, €i) = 1, i= 1, sag k, 
and all other products of pairs equal zero. Such a basis is called a sym- 


plectic basis. Thus, for its dual basis 


1 k k+l 2k 2k-+1 n 
{e’,...,e°,e°"7,...,e",e yee E}, 


the 2-form form may be expressed as 


k k n 
g = Sve netti Set Ae + ye ete’ 
i=1 i=l i=2k+1 
k k 
2X e A efti = DG a ekti = efti f e’). 
t=1 


i=1 


426 Chapter 9. Quadratic Forms 


Example 9.27 Let g: R? x R? — R be the bilinear form defined by 


g(x, y) = T1Y2 — T2Y1 + T3Y1 — T1Y3 + T2Y3 — T3Y2. 


Clearly, g(x, y) = —g(y, x), and the matrix representation of g with respect 
to the standard basis a = {e1, e2, e3} is 


0 1 -l | 
[gla = —1 0 1 ; 
1 —1 0 | 
which is skew-symmetric. By a simple computation, g(e;, e2) = 1 = 


—g(e2, e1). Let U be the subspace of R? spanned by e; and eg, i.e., the 
xy-plane. If we set W = {v € V : g(v, u) = 0 for any u E€ U}, then 
W = {dz : à € R}, where z = (1,1,1). Clearly, 8 = {e1, e2, Z} is a basis 
for R? and g(z, z) = 0 so that 


g: Logi] 
lWls= | -1 0 0). 
0 00] 


Problem 9.21 Prove that U N W = {0} in the proof of Theorem 9.35. 


Problem 9.22 Show that any bilinear form g on a vector space V is the sum of a 
symmetric bilinear form and a skew-symmetric bilinear form. 


Definition 9.15 A symplectic space is a pair (V, g) of a vector space V 
and a non-degenerate skew-symmetric bilinear form g, called a symplectic 
form. 


Thus, the dimension of a symplectic space is always even, say 2k, and 
there is a symplectic basis, for which the matrix of g is of the form in 
Theorem 9.35, or the form below the theorem. 

Note that, all the odd-dimensional subspaces are degenerate, and the 
subspaces V; spanned by {e),...,e,}, and V2 spanned by {e,41,...,e2%} 
are isotropic, and V = Vi @ V2. Moreover, the mapping 


gi: V2 > Vi, given by gi(v2)(v1) = (vo, v1) 


is an isomorphism, since dim V2 = dim V, = dimV,* and ker gf = {0}: v 
if v € V and gj(v) = 0 € Vř, then głř(v) = 0 € V* since Və is isotropic 
already. This means that v € kerg* and v = 0 by non-degeneracy of g. 
Therefore, V = VÝ @ Vı and 


9.11. Skew-symmetric forms 427 


Theorem 9.37 Let (V,g) be a symplectic space of dimension 2k. Let Vi be 
an isotropic subspace of V of dimension r. Thenr < k. Ifr < k, then Vi is 
contained in an isotropic subspace U of the maximum possible dimension k. 


Proof: Note that if V, is isotropic, trivially we have V C Vis and so 
r = dim V; < dim V$ = dimV — dim V, = 2k — r by Lemma 9.13. Thus 
r<k. 

Note that dim(V;1)+ = dim V — dim Vt = dim V; again by Lemma 9.13. 
Since, V, C (V+), we haveV, = Vt = (V;)+, or Vi = ker g*|y1. Then 
V+ has a symplectic basis, which possibly permits degenerate spaces, of the 
form 


P = {€1,---, kr} Ck—rtis +++ €2(k-r)} Ca(k—r)-419 +++» Cak—r}s 
so that 
0 Ip_-, 0 | 
een oN: 
0 0 | 
The vectors {€g(4—r)+1,--+;@2k-r} generating ker Ilya together with {e1, 


..; €x_,} form a basis for an k-dimensional isotropic subspace containing 
Vj. 


Theorem 9.38 Let (V,g) be a symplectic space of dimension 2k. Let Vi 
be an isotropic subspace of V of dimension k. Then there exists another 
isotropic subspace Vz of V of dimension k such that V = Vi ® Vz and gj : 
V2 + VČ is an isomorphism. 


Proof: Consider all the possible 2% pairs (I, J) of disjoint subsets J and J 
such that JU J = {1,...,k} = K. Then for each pair (J, J), the vectors 
{ei,ex4; | 7 E€ 1,7 € J} generate a k-dimensional isotropic subspace in V. 

Let U be the subspace spanned by {e1, ..., ex} and dim(V; NU) = s 
with 0 < s < k. There exists a subset J C K with k — s elements such 
that the subspace W spanned by {e; | i € J} is transversal to Vj NU: 
i.e, ViN UW = {0}: In fact, {basis for (Vi NU) U {e1,...,e,} generate 
U. Thus, {basis for (V; N U)} can be extended to a basis for U by adding 
k — s vectors from {e;, ..., ex}. The number of these vectors form J since 
VAU +W =U so that Vj NU NW = {0}. 

Let J = K—TJ, and V2 be the isotropic subspace spanned by {e;, ex4; | 7 € 
I,j € J}. To show Vi N Vz = {0}: By Theorem 9.37, Vt = Vı and 
Vz} = V. but V NU is contained in Vi, and W is contained in V2 so 


428 Chapter 9. Quadratic Forms 


that U = (Vi NU) +W is orthogonal to Vi N V2. But U is isotropic and 
k-dimensional so that Ut =U and V; N V2 CU, and 


Vi NV = (ViNU)N(WNU) = (Vi NU) AW = {0}. 


gi : Vo —> VĚ is an isomorphism since dim V2 = dimV,* = k and ker gj is 
contained in ker g* = {0}. 


Corollary 9.39 Suppose that a symplectic space (V,g) is written as 
V=V 8V =V @ V, 


for isotropic subspaces V; and V;. Then there exists an isometry f : V >V 
such that 
F(Vd) =V, for i=1,2. 


Proof: Choose a basis {e1, ..., ex} for Vj and its dual basis {eķ+1, ..., 
€,}in V2 relative to gj : V2 > Vř. Then {e1, ..., eo¢} is a symplectic 
basis for V. Similarly, we have another symplectic basis {e}, ..., e5,,} with 
respect to V = V/ ® Vz. The mapping f(e;) = ej is clearly the required 
isometry. 


Definition 9.16 An isometry T : V > V of a symplectic space (V,g) of 
dimension 2k such that 


g(Tu, Tv) =g(u,v), forall u,v EV, 
is said to be symplectic. 
The symplectic group of (V, g) of dimension 2k is denoted by Sp(2k, R). 
Since, in a symplectic basis a = {e1, ..., €24} for V, 
E o & |. 
[Gla z | =i; 0 | = Fox, 


A = |[T]a € Sp(2k,R) if and only if A? E> A = Fox. Thus, det A = +1. In 
fact, we will see later that det A = 1. Since ES = —Io, or Ezp = — Ek, 
A = -Ex (AT) ' Ex. 

Let A be a non-singular skew-symmetric matrix of order 2k. The bilin- 
ear form b on R given by b(u, v) = uf Av is non-degenerate and skew- 
symmetric. By choosing a symplectic basis 8 for R”, 


A= [bla = B*[b|gB, [b]s = Ez, 


for some non-singular matrix B: Actually, B = |I dé, where a: is the stan- 
dard basis for R?*. Hence, det A = (det B)?. 


9.11. Skew-symmetric forms 429 


Theorem 9.40 Let A be a skew symmetric matriz. Then there exists a 
unique integer P f(A) that is a polynomial of the integer coefficients consist- 
ing of the entries of A such that det A = (Pf(A))? and Pf (Eo) =1. This 
polynomial is called the Pfaffian of A, and has the property: 

Pf(B* AB) = det B- Pf(A), 
for ant matriz B. 
Proof: Consider AE = k(2k—1) independent variables over Q: {aj; | 1 < 
i < j < 2k}. Let K be the field of rational functions of a;; over Q. Set 
A = [aj;| such that AT = —A. Define b on R% given by b(u, v) = uf Av, 
which is non-degenerate and skew-symmetric. For a symplectic basis ( for 
R?* as above, det A = (det B)? where det B is a rational function of a;; with 
coefficients from Q. Since det A is a polynomial with integer coefficients, 
its square root also must have integer coefficients by the unique factoriza- 
tion theorem for a polynomial in Z[a;;|. The sign of vdet A is uniquely 
fixed by the requirement that the value of ydet Eo, must be 1. Thus, for 
skew-symmetric A and B! AB, 


P f?(B* AB) = det(B* AB) = (det B)? det A = (det B)?P f?(A). 


Hence, 
Pf(B" AB) =+det BPf(A). 


The sign + is determined by the case A = Ho, and B = Ing. 


Example 9.28 


0 a12 
P = Qq]? 
f —ayj2 0 pi 
0 a12  @i3 14 
—a]2 0 a23 G24 
Pf = —412034 + 413434 — 414023. 


—aı3 —a23 0 a34 


Corollary 9.41 The determinant of any symplectic matriz is 1. 


Proof: Since A! Esg A = Esą and by Theorem 9.40, 


1 = Pf (Ex) = P f(A! Ex A) = det AP f(E»). 


430 Chapter 9. Quadratic Forms 


Theorem 9.42 The characteristic polynomial p(A) = det(A—AI) of a sym- 
plectic matriz A satisfies 


p(A) = 7*p(A7!). 


Proof: By Corollary 9.41, det A = 1, and so 


det(AI — A) det(AI + Eo,(A’)~! Eok) = det(AI + (A?)~) 
= det(AA? + I) = d7* det(A! — AT) 


A% det(A T= A). 


Corollary 9.43 Let à be an eigenvalue of a symplectic matrix A. Then so 


are AT}, à, and A7!. 


Proof: Since A is non-singular, A 4 0 and p(A7! = A~?¥p(A) = 0. Since 


the coefficients of p are real, p(A) = p(A) = 0. 


9.12 Decomplexification and complexification 


Recall that a unitary, or a Hilbert, space is a complex vector space V with 
a positive definite Hermitian form g. Since R is a subfield in C, the complex 
vector space C” can be considered as a real vector space by restricting the 
scalars to only real numbers. In this section, we investigate the relationship 
between real vector spaces and complex vector spaces. Let us denote Cp for 
the real vector space obtained from C” by the restriction of the scalars to 
the real field. This space is called the decomplexification of C”. 
Let a = {e1, ..., en} be a basis for C”, and let 


aR = {e1, ..., Cn, 1€1, ..., ten}. 


Then, for a vector z = >> zpex E T”, zk = TE + Yki E C with zk, yz, E R. 


Thus 
Z= X zrek = X aper + X wrliex). 


Thus, ap spans Ch. Moreover, if }> speg + D0 yx (tex) = 0 for £k, Yk E R, 
then $` (£k + typ )ex = 0 and so £k + ty, = 0, or £k = 0 = yx. Hence, ap is 
a basis for Cp. Therefore, 


dim Cp = 2n =2dimC”. 


9.12. Decomplexification and complexification 431 


Note that when C” is considered as a 2n-dimensional real vector space, the 
space R” = {(x1,@2,...,%n) : zi E R} is a subspace of Ch, but not in C”, 
which is an n-dimensional complex vector space. 

Let T : C” + C” be a linear transformation. Then Tr : Cp — Cp is 
still linear, and 


Ir=I, (aT +06S)p=aTrp+0Sp, for a,b ER. 


Let a = {u}, ... Um} and 6 = {vj,..., Vn} be bases for C” and C”, respec- 
tively. Then 


T(ug) = SO zjkvj = EZAZ s X yjrlivy), 
T (iug) = iT (ux) = X izjkYj == So yk; + X ajp(iv;) 


shows that the matrix representations of T with respect to œa and 8, and 
that of Tr with respect to ag and p are written as 


(re = [25%] = [x 5x] + ilyjk] =X +1Y, [TR]fE i | : i | f 


Problem 9.23 Let A = X + iY, where X,Y € Mn(R), and let B = | Y | 


Show that 
det B = | det Al’. 


In reverse direction, we may consider complexification of a real vector 
space, which gives restoration of the decomplexifications. 


Definition 9.17 Let V bea real vector space. A linear operator J: V > V 
satisfying the condition J? = —Jd is called a complex structure on V. 


For example, on Ch, the operator J : Ch —> Ch given by J(z) = iz is 
called the canonical complex structure. Then for any a= x + iy € C, 


az = xZ + iyz = £Z + yJ (z). 


Theorem 9.44 Let V be a real vector space with a complex structure J. 
Define the scalar multiplication on V by complex numbers by 


(x +iy)z = zz + yJ(z). 


Then V becomes a complex vector space VČ such that (V°)r = V. 


432 Chapter 9. Quadratic Forms 


Proof: We prove here the associativity for scalar multiplication, all other 
properties can be easily verified. 


(e+ iy)[(u + iv)z] = (£+ iy)[un + vJ (z) 

xluz + vJ(z)| + yJ[uz + vJ(z)] 

= guz+«vJ(z) + yud(z) — you 

(cu — yv)z + (xv + yu) J(z) 

= [ee p) Hert wes a aye. 


Corollary 9.45 Let V be a real vector space with a complex structure J. 
Then dimp V = 2n is even, and the matrix of J in an appropriate basis a 


is of the form: 
[Fla = | r a | . 


Proof: Note that dimr(V°)r = 2dimg VČ. Choose a basis a = {v1,..., Vn} 
for VČ over C such that the matrix of the multiplication by 7 in this basis is 
iln. Thus the matrix of J in ar = {V1, ..., Vn, ivi, .--, iVn} of V must 
be the required one. 


Remarks: (1) Let V be a complex space. For a real linear transformation 
S : Vr — Vpr, when can we find a complex linear transformation T : V > V 
such that Tr = S? If then, S must commute with J on Vp, since 


S(iv) = S(Jv) =iS(v) = JS(v), Vv eV. 
This is also sufficient, since 


Sll + iyv) = #S(v) + ySliv) = zS(v) + yS(Jv) 
= «£5(v)+yJS(v) = («+ yJ)S(v) 
(x + yt)S(v). 


(2) For an even dimensional real space V, when does there exists a com- 
plex structure J such that a given real operator S is the docomplexification 
of a complex linear transformation T on VČ that is constructed from J? In 
case of dimpg V = 2, we have a partial answer: J exists if S does not have 
eigenvectors in V: In this case, S has two complex conjugate eigenvalues 
A+ ip, A, (#0) ER Let 


J =p (SAT). 


9.12. Decomplexification and complexification 433 


Then S$? — 2S + (A? + p?)I = 0 by the Cayley-Hamilton Theorem 8.13. 
Thus, 


J? = p? (8? — 248 + 71) = —Id. 


Moreover, J commutes with S. 


Remark: Let a = {e1,...,e,} be an orthonormal basis for a unitary space 
(V,g) and ag = {6€1, ..., Cn, te, ..., iEn} for Vr. Then for any z = 
X zjej EV with zj = zj + ty;, 


So lag? o a a a 
= || X lje; + yj(ied|Ilz. 


lzllé: 


Thus, Vr has the unique structure of a Euclidean space. However, the inner 
products in V do not coincide with the inner products in Vp, since the latter 
assumes only real values, while the former assumes complex values: In fact, 
a Hermitian inner product g on a complex vector space V induces both 
symmetric and skew-symmetric forms on Vp as follows: Let 


a(u, v) = Rg(u, v), b(u, v) = Sg(u, v). 
Theorem 9.46 The two bilinear forms a and b on Vpr satisfy the following 


properties: 


(1) a is symmetric, and b is skew-symmetric, and they are invariant under 
the multiplication by i, which is the canonical complex structure J on 
Vr: co be 4 

a(iu,iv) =a(u,v), b(iu,iv) = b(u, v). 

(2) a(u, v) = b(iu, v), and b(u, v) = —a(iu, v). 


(3) Any pair of i invariant bilinear forms a and b, symmetric and skew- 
symmetric, satisfying (2) above defines a Hermitian inner product g 
in V by the formula 


glu, v) = a(u, v) + ib(u, v). 


(4) The form g is positive definite if and only if a is positive definite. 


Proof: Note that g(v,u) = g(u,v) is equivalent to 


a(v,u) + ib(v, u) = a(u, v) — ib(u, v), 


434 Chapter 9. Quadratic Forms 


which implies that a is symmetric and b is skew-symmetric. Moreover, 
gliu, iv) = iig(u, v) = g(u, v) shows that a and b are i invariant. 


a(iu, v) + ib(iu, v) = gliu, v) = ig(u, v) = —b(u, v) + ia(u, v) 


implies (2) and (3). Finally, by skew-symmetry of b, we have g(u,u) = 
a(u, u), which implies (4). 


Corollary 9.47 Let a= {e1,...,en} be an orthonormal basis for a unitary 
space (V,g). Thenar = {e1, ..., Cn, i€1, ..., €n} is an orthonormal basis 
for (Vr, a) and a symplectic basis for (Vp, b). 

Conversely, let V be a 2n dimensional real vector space with a Eu- 
chdean inner product a and a symplectic form b as well as a basis ag = 


{€1, ---, En, Cn4i, ---, Can} which is orthonormal for a and symplectic 
forb. Then we can construct a unitary space (VC, g) with an orthonormal 
basis a = {e1,...,€n}, by defining a complex structure J as 


J(e;) = @n+j,; J(en+j) =g for 1<j<n, 
and an inner product g as 


glu, v) = a(v, u) + ib(v, u). 


9.13 Exercises 


9.1. Find the matrix representing each of the following quadratic forms: 
(1) £? + 4r1x + 323, 
(2) £? — a3 + £3 + 4r1g£3 — 522%, 
(3) x? — 202 — 3x} + 42122 + 621273 — 82223, 


(4) 321y1 — 2z1Y2 + Sxroy1 + Trey — 8rey3 + 4T3Y2 — £33, 
2 0 Ly 


9.2. Sketch the graph of each of the following quadratic equations: 
(1) zy = 2, 
(2) 53a? — 72ry + 32y? = 80, 
(3) 162? — 24ry + 9y? — 60x — 80y + 100 = 0. 


9.13. Exercises 435 


7 
9.3. Let q be a quadratic form on R and let A = | 4 
5 


matrix representing q with respect to the basis 
a={(1, 0, 1), (1, 1, 0), (0, 0, 1)}. 
(1) Diagonalize A, i.e., find an orthogonal matrix P so that PTAP is a 
diagonal matrix. 


(2) Construct a basis 8 for R? such that the elements of 8 are the principal 
axes of the quadratic surface q(x) = 0. 


9.4. For a given quadratic equation ax? + 2bry +cy?+dz+ey+ f = 0 with b £ 0, 
classify the conic section according to the various possible cases of a, b, and 
c (see Example 9.5). 


9.5. For a positive definite quadratic form q(x) = ax? + 2bxy + cy”, the curve 
q(x) = 1 is an ellipse. When a = c= 2 and b = —1, sketch the ellipse. 


9.6. Show that if A and B are both positive definite, so are A?, AT! and A+ B. 
9.7. Prove that if A and B are symmetric and positive definite, so is A? + B71. 


9.8. Find a substitution x = Qy that diagonalizes each of the following quadratic 
forms, where Q is orthogonal. Also, classify the form as positive definite, 
positive semidefinite, and so on. 

(1) q(x) = 2x? + Gry + 2y?. 
(2) g(x) =a? +y? + 2° +2(zy + z2 + yz). 


9.9. Determine whether or not each of the following matrices is positive definite: 


r 22s ee] 1 0 1 
G)A4=| -1 2 esi lle @BA=|010 
=i i. 2 1 0 1 


Use the decomposition A = LDLT to write xT Ax as the sum of squares. 
9.10. Let b be a bilinear form on R? defined by 


b(x, y) = 2z1y1 — 3x1 yo + £22. 
(1) Find the matrix A of b with respect to the basis a = {(1, 0), (1, 1)}. 
(2) Find the matrix B of b with respect to the basis 6 = {(2, 1), (1, —1)}. 


(3) Find the transition matrix Q from the basis 8 to the basis a and verify 
that B = QT AQ. 


9.11. Find the signature, index and rank of each of the following symmetric ma- 
trices: 


aft a3], @)3s 1 | 


ar aged gee ee ee 


9.12. Which of the following functions 6 on R? are of bilinear form? 


436 


Chapter 9. Quadratic Forms 


(1) b(x, y) = 

(2) B(x, y) = > — yi)? + T2y2 

(3) b(x, y) = (z1 +91)? — (z1 — y1)? 
(4) b(x, y) = t1y2 — t2y1 


. For a bilinear form on R? defined by b(x, y) = 21y1 + z242, find the matrix 


representation of b with respect to each of the following bases: 


a= {(1, 0), (0, 1)}, B=({(1, -1), (1, 1)}, y= {(1, 2), (3, 4)}. 


. Which one of the following bilinear forms on R? are symmetric or skew- 


symmetric? For each symmetric one, find its matrix representation of the 
diagonal form, and for each skew-symmetric one, find its matrix representa- 
tion of the block form in Theorem 9.35. 

(1) b(x, y) = t143 + £391 

(2) b(x, y) = zt1yı + 2z1y3 + 2a3y1 — £22 

(3) b(x, y) = z1y2 + 2£1y3 — toyz — T2Y1 — 2L3Y1 + Taye 

(4) B(x, y) = DD ij= 1 — jf) iy. 


. Determine whether each of the following matrices takes a local minimum, 


local maximum or saddle point at the given point: 
(1) f(z, y) = -1+4(e” — x) — 5z siny + 6y? at the point (x, y) = (0, 0); 
(2) f(x, y) = (a? — 2x) cosy at (z, y) = (1, 7). 


. Show that the quadratic form q(x) = 2x? + 4zy + y? has a saddle point at 


the origin, despite the fact that its coefficients are positive. Show that q can 
be written as the difference of two perfect squares. 


. Find the eigenvalues of the following matrices and the maximum value of the 


associated quadratic forms on the unit sphere. 


i tee TL) aves a] 


| o rent | o = ali 


| 23 0| 


lo os] 


. Determine whether the following statements are true or false, in general, and 


justify your answers. 

(1) For any quadratic form q on R”, there exists a basis a for R” with 
respect to which the matrix representation of q is diagonal. 

(2) Any two matrix representations of a quadratic form have the same 
inertia. 

(3) The sum of two bilinear forms on V is also a bilinear form. 

(4) If A is a real symmetric positive definite matrix, then the solution set 
of x? Ax = 1 is an ellipsoid. 

(5) For any nontrivial bilinear form b Æ 0 on V, if b(v,v) = 0, then v = 0. 

(6) Any symmetric matrix is congruent to a diagonal matrix. 

(7) Any two congruent matrices have the same eigenvalues. 


9.13. Exercises 437 


(8) Any two congruent matrices have the same determinant. 
(9) Any matrix representation of a bilinear form is diagonalizable. 
(10) Ifa real symmetric matrix A is both positive semidefinite and negative 
semidefinite, then A must be the zero matrix. 
(11) Any two similar real symmetric matrices have the same signature. 


Hints or Answers to Exercises 


Chapter 1 
Problems 
1.2 (1) Inconsistent. 
(2) (a1, £2, £3, z4) = (—1 — 4t, 6 — 2t, 2 — 3t, t) for any tE R. 
1.3 (1) (x, Y, z) = (t, —t, t). (3) (w, T, Y, z) = (2, 0, 1, 3) 
1.4 (1) bi + bg = b3 = 0. (2) For any b,’s. 
1.7 a=-%,b=8B, c= 8, d= —4. 
: ; A EnA Al Ze SL _|8 7 
1.9 Consider the matrices: A = | 3 6 , B= | 3 al C= | 01 | 
1.10 Compare the diagonal entries of AAT and ATA. 
1.12 (1) Infinitely many for a = 4, exactly one for a 4 +4, and none for a = —4. 
(2) Infinitely many for a = 2, none for a = —3, and exactly one otherwise. 
1.14 (3) I = IT =(AA“!)? = (A7!)TAT means by definition (A7)—1 = (A71). 


1.17 
1.21 
1.22 
1.23 
1.24 


1.25 


1.26 


1.27 
1.28 


1.30 


1.31 


Any permutation on n objects can be obtained by taking a finite number of 
interchangings of two objects. 
Consider the case that some d; is zero. 


CH= 2,y=3,2=1. 

No, in general. Yes, if the systems is consistent. 
1 0 0 1 -1 0 

L= | -1 1 0], U=] 0 1 -1 
0 -1 1 0 0 1 


(1) Consider (i, 7)-entries of AB for i < j. 

(2) A can be written as a product of lower triangular elementary matrices. 
1 0 0 2 0 0 1-1/2 0 

L=|-1/2 1 0O|],D=|0 3⁄2 0 |,u=]0 1 -2⁄3 
0 -2/3 1 0 0 4/3 0 0 1 


There are four possibilities for P. 
(1) L = 0.5, fo = 6, I = 0.55. (2) L =0,fg = B = 1,4, = 15 =5. 
0.35 
x=kj| 0.40 |, fork>0. 
0.25 
0.0 0.1 0.8 90 
A=] 04 0.7 0.1 | withd= | 10 
0.5 0.0 0.1 30 


446 


Hints or Answers to Exercises 


Exercises 1.10 


447 


1.1 Row-echelon forms are A, B, D, F. Reduced row-echelon forms are A, B, F. 


1 -3 2 1 2 
0 0 1 -1/4 3/4 
PG ace 0 
0 00 0 0 
1 -3 0 3/2 1/2 
0 0 1 -1/4 3/4 
Ve) 0 0 0 p s 
0 00 0 0 
1.4 (1) zı = 0, z2 = 1, z3 = —1, x4 = 2. (2) z = 17/2, y = 3, z = —4. 
1.5 (1) and (2). 
1.6 For any b;’s. 
1.7 bı — 2bz + 5b3 £0. 
1.8 (1) Take x the transpose of each row vector of A. 
1.10 Try it with several kinds of diagonal matrices for B. 


1 2k 3k(k—1) 
1.11 A*=|]0 1 3k 
0 0 1 


See Problem 1.11. 


1.13 


1.15 (1) AAB = B. (2) ATAC=CH=A+I. 
1.16 a=0,c !=b#40. 
1 =i 0 0 
13/8 
|0 1/2 -1/2 0 25 
117 At=| 9 "5 1/3 —1/3 ‘Bree ee 
0 0 0 1/4 
8 -19 2 
1.18 A= 4| 1 -23 4 
4 -2 1 
1/3 1/6 1/6 2 
1.21 (1)x=A ‘b= | —4/3 —5/3 4/3 5S 
—1/3 —2/3 1/3 7 
1 0 2 0 1 1/2 
aoa I A eara 
100 10 0 1 2 3 
1.23 (1) A=|2 1 0 02 0 ee Ce a 
RA 0 0 -1 0 0 1 
ale ME 0 iN | 
b/a 1 0 d-b/a 0 a 
1.24 c=[2 — 13], x= [423]. 
1 00 1 0 0 1 1 1 
1.25 (2)A=]1 1 0 0 3 0 0 1 4/3 
1 1 | 0 |o 0 l 


-1/2 —1/8 
1/2 3/8 
0 —1/4 
8/3 
-5/3 
—5/3 
L=A,D=U=I 


Hints or Answers to Exercises 


(1) (Al (AS Oy ArT = 0 if A € Mnxn- 
Heee + ART) = I AF. 


. (2) A= AA? = A'A =I. 


) 
5 |: (4) Take B = —A. 


(@) A=[) : ond B=| 5 ‘a 


om=|] olmam] kasasi alk 


True: (1). (3). (5). (7). (10). (12) (A)? = (AT)“1 = AM}, 


Chapter 2 


Problems 


2.2 
2.3 
2.5 
2.6 


2.7 
2.10 
2.12 
2.13 
2.15 
2.16 
2.17 
2.19 
2.20 
2.21 


2.22 
2.23 


2.25 


(2), (4). 

(1), (2), (4). 

See Problem 1.11. For any square matrix A, A = 
Note that any vector v in W is of the form a,x; + a2X2 +--+ +@ Xm which 
is a vector in U. 

tr(AB — BA) =0. 

Linearly dependent. 

Any basis for W must be a basis for V already, by Corollary 2.13. 

(1) n—1, (2) BD, (3) 20, 

63a + 39b — 13c + 5d = 0. 

If bı, ..., bn denote the column vectors of B, then AB = [Ab, --- Ab,]. 
Consider the matrix A from Example 2.19. 

(1) rank = 3, nullity = 1. (2) rank = 2, nullity = 2. 

Ax = b has a solution if and only if b € C(A). 

(1) and (2) are clear, since Bx = 0 implies (AB)x = A(Bx) = 0. 

(3) For an m x n matrix A and an n x p matrix B, 


C(AB) = {ABx : xe RP} C {Ay : ye R"} =C(A), 


A+AT A-AT 
2 ols 2 


because Bx € R” for any x € R’. 

(4) R(AB) = C((AB)") = C(BTAT) C C(BT) = R(B). 

A~'(AB) = B implies rank B = rank A~1(AB) < rank(AB). 

By (2) of Theorem 2.22 and Corollary 2.21, a matrix A of rank r must have 
an invertible submatrix C of rank r. By (1) of the same theorem, the rank 
of C must be the largest. 

dim(V + W) = 4 and dim(VN W) = 1. 


Hints or Answers to Exercises 449 


2.26 A basis for V is {(1,0,0,0), (0,—1,1,0), (0, —1,0,1)}, 
for W: {(—1,1,0,0), (0,0,2,1)}, and for V A W : {(3, —3,2,1)}. Thus, 
so dim(V + W) = 4 means V +W = R4 and any basis for R4 works for V + W. 
l 1000 a 1 1 
OT 0 A E 28 
REA a D ee | RT oa lg 
0 1 2 3 d 4 0 
Exercises 2.10 
2.1 Consider 0(1, 1). 
2.5 (1), (4). 
2.6 No. 
2.7 (1) p(x TR: 1(x) + 8p2(x) — 2p3 (x). 
2.11 {(1,1,0),(1,0,1)}. 
2.12 2. 
1 ifi=j, 


2.13 
2.14 


2.15 
2.16 
2.17 


2.18 
2.20 


2.21 


j Stasis = 
Consider {e = {a;}?2,} where a; { 0 otherwise. 


(1) 0 = c1 Ab!+---+c,Ab? = A(cib!+---+c,b?) implies c;b'+- --+c,b? = 0 
since N (A) = 0, and this also implies c; = 0 for alli = 1,...,p since columns 
of B are linear independent. 

(2) B has a right inverse. (3) and (4): Look at (1) and (2) above. 

(1) {(-5,3,1)}. (2) 3. 


5!, and dependent. 


(1) R(A) = ((1,2,0,3), (0,0, 1,2)), C(A) = ((5,0, 1), (0,5, 2)) 
N(A) = ((- 2,1,0,0), (—3,0,—2,1)). 
(2 ) R(B )= ((1, 1, —2, 2), (0,2,1, —5), (0,0,0, 1)), 


C(B) = ((1,—2,0), (0,1,1), (0,0,1)), M(B) = ((5,—1,2,0)). 

rank = 2 when x = —3, rank = 3 when z # —3. 

Since uv? = u[v, -+ vn] = [viu +- vnu], each column vector of uv? is of 
the form v;u, that is, u spans the column space. Conversely, if A is of rank 
1, then the column space is spanned by any one column of A, say the first 


column u of A, and the remaining columns are of the form viu, i = 2,..., n. 
Take v = [1 v2 --- Un|T. Then one can easily see that A = uv’. 
False: (1) [0] is not in the set. 


OE E E (| : , | E b and B= Terenos: 


Snes |b 0080 _f1 0 
(5) Consider A = | 10 and B = | 0 0 | 
(8) For any x € V—U,x-x=O0€U. (9) U = {re|r € R} and V = 
{re?|r € R}, dim U = dim V = 1. 
True: (Prove) (4). (6). (7). (10). 


450 Hints or Answers to Exercises 
Chapter 3 
Problems 

3.1 To show W is a subspace, see Theorem 3.2. Let Ei; be the matrix with 1 
at the (i, 7)-th position and 0 at others. Let Fẹ be the matrix with 1 at the 
(k, k)-th position, —1 at the (n,n)-th position and 0 at others. Then the 
set {Fij Fk: Lae gS a, k= 1,...,n — 1} is a basis for W. Thus 
dim W = n? — 1. 

3.2 tr(AB) = O72) Dra ai* a! = Dra Dici bra; = tr(BA). 

3.3 | r i , since it is simply the change of coordinates x and y. 

3.4 If yes, (2, 1) = T(—6, —2, 0) = —2T(3, 1, 0) = (-2, —2). 

3.5 If aiv! + azv? +--+ apv" = 0, then 
0 = T(a,v! +agv? +---+a,v") = ayw! + agw? +---+a,w* implies a; = 0 
fori=1,...,k. 

3.6 Ifv € ker(T), then T(v) = 0 = T(0). Thus, v = Oif T is 1—1. Conversely, if 
T(u) = T(v), then T(u) —T(v) = T(u—v) = 0. Thus u-v € ker(T) = {0}, 
and so u = v. 

3.7 (See Problem 2.21) (1) If v € ker(T), then T(v) = 0 and so SoT(v) = 0 
implies v € ker(S o T). (2) If w € im(S o T), then So T(v) = w for some v. 
Thus, w = S(T (v)) and so w € im(S). 

3.10 Use rotation Rz and reflection | i = about the x-axis. 
3.11 (1) (5, 2, 3). (2) (2, 3, 0). 
2 -3 4 0 7 4 
3.12 (1) (Tla=|5 -1 2 |, T]la=|2 -1 5 
4 70 4 =g 2 
12 0 0 
3.13 T\3 =| 1 0 -3 1 
02 3 4 
3.0 0 3 2 0 
3.15 [S+T]a=] 2 2 3 |,[ToS]4=] 3 3 3 
2 3 3 6 5 38 
1 -1 0 2 3 0 
3.16 [S]Z = | 1 1 | 0 3 6 
0 0 1 0 0 4 
g 1 0 =i 1 0 
Sag aiei J IT TE d 
1 0 -1 5 —2 -3 7 
3.18 d =3|4 3 -1],[dg=}] 3 5 -10 
2 1 1 1 1 -2 
1 2 1 4 5 
319 [T]e=| 0 -1 0 |, T]ls=| -1 -2 -6 
1 0 4 1 1 5 


Hints or Answers to Exercises 451 


3.20 


3.22 


Write B = Q-!AQ with some invertible matrix Q. 

(1) det B = det(Q~'1AQ) = det Q7! det Adet Q = det A. (2) tr (B) = tr 
(Q-1AQ) = tr (QQ71A) = tr (A) (see Problem 3.2). (3) Use Problem 2.22. 
a* = {fı(z, Y, z) =g% žy, folz, Y, z) = iy, fa(a, Y, z) = -r +z}. 


Exercises 3.10 


3.1 
3.2 
3.5 
3.6 
3.7 
3.8 


3.9 


3.12 


3.16 


3.17 


3.18 
3.19 


3.20 


3.25 


3.26 


(2). 

az? + ba? + ax +c. 

(1) Consider the decomposition of v = vt Tw) +o) 
(1) {(a, 3”, 2r) E€ R? : x € R}. 


(2) T(r, s, t) = (r, 2r — s, Tr — 3s — t). 

(1) Since T o S is one-to-one from V into V, T o S is also onto and so T is 
onto. Moreover, if S(u) = S(v), then T o S(u) = T o S(v) implies u = v. 
Thus, S is one-to-one, and so onto. This implies T is one-to-one. In fact, if 
T(u) = T(v), then there exist x and y such that S(x) = u and S(y) = v. 
Thus T o S(x) = T o S(y) implies x = y and so u = T(x) = T(y) = v. 

Note that T cannot Pe one-to-one and S cannot be onto. 


0 0 1 -12 
0 0 0 1 
-1/3 2/3 
(1) | -5/3 A 
off loji- 
(1) TQ, 0, 0) = (4, 0), T, 1, 0) = (1, 3), TG, 1, 1) = (4, 3). 
(2) T(x, y, z) = (4x — 2y + z, y+ 22). 
1 1 1 0 1 0 
olo 1 ar 0 0 i 
0 0 1 0 0 0 
0 0 1 1 1 1 
Ceres Wa. 1 1|, 3Q=|1 1 oi epee 
1 -1 0 1 0 0 
Use the Cen a 
D| 4 49 | 
=2/3 1/3 4/3 
@| 4 a (4) | 2/3 -1/3 -1/3 
7/3 —2/3 -8/3 
0 2 1 
r=] -i 4 7 = ((T"ae)? 
1 0 1 
110 A aal 


452 Hints or Answers to Exercises 
3.27 N(T) = {0}, 
C(T) = ( (2,1,0, 1), (1,1,1, 1), (4,2,2,3) ), [T] = 


1 
3.29 p(x) =1+ x- įx?, p(x) = —3 + 52”, p(x) = —3 +2- 52”. 
3.30 False: (1) Consider T : R? — R? defined by T(z, y) = 
(5) Consider p(x) = 2. (7) For two bases a = {u;, u2} Æ 8 = {v1, vo} and 
T(u;) = v; so that T 4 Id, but |[T]} = I2. 
True: (2) dimIm(T) + dim Ker(T) = n > m > dim Im(T). 
(3) dimkerT = dim N ([T]}). (4) dimim(T) = dimC([T]a) = dim R([T]a). 
(6) By definition. (8) See Theorem 3.9. (9). 


Chapter 4 
Problems 


4.4 (1) —27, (2) 0, (3) (1 — z4)’. 

4.10 (1) —14. (2) 0. 

4.11 See Example 4.6, and use mathematical induction on n. 

4.12 Find the cofactor expansion along the first row first, and then compute the 
cofactor expansion along the first column of each n x n submatrix (in the 
second step, use the proof of Cramer’s rule). 

4.16 If A = 0, then clearly adjA = 0. Otherwise, use A - adjA = (det A)I. 

4.17 Use adjA - adj(adjA) = det(adjA) = I. 

4.18 (1) Tı 4, T2 1, T3 —2. 

5 
(2) = BUH pE 


J A ` = a — detCiy _ 
4.19 The solution of the system Id(x) = x is x; = Yý = det A. 


Exercises 4.6 


4.1 k=O or 2. 
4.2 It is not necessary to compute A? or A3. 
4.3 —37. 
4.4 (1) det A = (—1)"-1(n — 1). (2) 0. 
4.5 —2,0,1,4. 
4.6 Consider 5 Qio(1) °°" Ano(n)- 
oESy 


(1) 1, (2)24. 

4.8 (3) zı = 1, T2 = —1, 3 = De t4 = —2. 
(2) x = (3,0, 4/11)7. 

= 0 or +1. 


Hints or Answers to Exercises 453 


4.13 (3) Ai = 2, Ai2 = 7, Aj3 = —8, A33 =3. 


-3 5 9 
4.16 At=7| 18 -6 18]. 


6 14 -18 
2 -7 —6 
4.17 (1) adj(A)=| 1 -7 -3 |, det(A) =—7, det(adj(A)) = 49, 
-4 7 5 
Eod =i 
A-! = —Ladj(A). (2) adj(A)= | -10 4 2], 
7 eg ai 


det A = 2, det(adj(A)) = 4, A~* = Sadj(A). 
4.18 Note that (AB) = B~1A~! and A`! = adj(A)/det A. (The reader may 
also try to prove this equality for non-invertible matrices.) 


4.19 If we set A= | : i | then the area is 4| det A| = 4. 
1 2 
4.20 If we set A= | 1 2 |, then the area is 4y/|det(AT A)| = 3v2, 
2 1 
4.21 Use det A = 5 sen(7)a1¢(1)***4no(n)- Suppose that, for a permutation 


7ESn 
o € Sn, oli) E€ {k+ 1,... n} for some i = 1,...,k. Then there is an 
LE {k+1,...,n} such that o(£) € {1,...,k}. Thus aese = 0, and so 
sen(o)a19(1)*** Geo(e)***@no(n) = 0. Therefore the only terms that do not 
vanish are those for ø : {1,...,k} — {1,..., k}. But then ø : {k+1,...,n}— 
{k+1,...,n}. ie, o = 0102 for o1 E€ Sn fixing {k + 1,...,n} and o2 € Sn 
fixing {1,...,k}. Since sgn(o) = sgn(o1)sgn(o2), we have, 


det A = 5 sen(o1)a16(1) ~t Aka(k) * 5 sgn(o2)Ak+1o(k+1) t Ano(n) 
o1ESk O2ESn-k 
= det B det D. 
4.22 Multipl a to the right 
: ultiply | 7 | to the right. 


4.23 vol(T(B)) = | det(A)|vol(B), for the matrix representation A of T. Clearly, 
C = AB, so vol(P(C)) = | det(AB)| =| det A|| det B| = | det A| vol(P(B)). 


4.24 False: (1) For A= | : d and B = E p | 9=det(A+ B) A det A+ 


det B = 5. (8) For A= | | deer- A) =0#2= 2- det. 


1 0 0 1 
o |*= forse 


(5) E= | nes | (8) det(kA) = k” det(A). (9 


1 

Bl 
1 0 1 

solution, but det(A) = 0. (13) Consider | 1 1 0 |. 
0 1 1 


454 Hints or Answers to Exercises 
(14) False when det(A) = 0. (15) Note that a permutation matrix A is the 
product of finite number of single permutations (i.e, A = P,---P, where 
P; is a permutation matrix with only two rows exchanged, which is the in- 
verse of itself and symmetric: P? = P; = Po) Thus A7! = P,- Py = 
PT... PP = (P,--+ Py)? = AT. But AT needs not be A. (16) See (1) above. 
True: (2) det(AB) = det(A) det(B). (4) (cI, — A)? = cI, — A’. 

(6) If A? = —Is, (det(A))? = —1. (7) (det(A))* = 0. (10) See Exercise 2.20: 
Since uv? = [viu --- , vnu], det(uv”) = (v,---v,) detfu --- u] = 0. 
(11) A(adjA) = det(A)I = I, and so det(A) det(adjA) = det(adjA) = 1 and 
A! =adjA. Thus, adj(adjA) = (adjA)~! = A. (12) AT! = adjA. 
Chapter 5 
Problems 
5.1 Note that (x,x) = ax} + 2ca,x2 + br} = a(x, + £22)? + ab- g3 > 0 for all 
x = (%1,%2) # 0. For x = (1,0), we get a > 0. For x = (—£,1), we get 
ab — c? > 0. The converse is easy from the equation above. 
5.2 (x,y)? = (x,x)(y,y) if and only if ||tx+y]|? = (x, x)t?+2(x,y)t+(y,y) =0 
has a repeated real root to. 
5.3 (4) Compute the square of both sides and use Cauchy-Schwarz inequality. 
5.5 (f,g) = J f(x)g(x)dx defines an inner product on C [0, 1]. Use Cauchy- 
Schwarz inequality or Problem 5.3. 
5.6 (1) VAG 1, —1), (2) ra (6, 4, —3). 
5.7 (1): Orthogonal, (2) and (3): None, (4): Orthonormal. 
5.10 {1, V3(2x — 1), V5(6x? — 6x + 1)}. 
5.12 (1) is just the definition, and use (1) to prove (2). 
5.14 Projy(p) = (4, 3,—4). 
5.15 PT =(PTP)? = PTP = P, and P = PTP = P?. 
5.16 For x € R”, x = (v!,x)vi+---+(v™,x)v™ = (vive )x+ eet (vy) x, 
F 2 E2423 
5.17 The null space of the matrix o -1 11/38 
x=t[1 —110]7+s[-4101]* for t,s € R. 
5.19 R(A)+t =N (A). 
5.20 x= (1, —1, 0) + t(2, 1, —1) for any number t. 
5.22 For A = [v! v?], two columns are linearly independent. 
1 2 1 1 
5.23 P==] 1 2 -1 
Ped sed og 
5.25 -3 +z. 
So —0.4 
5.26 | vo | =x=(ATA)!ATb= | 0.35 
ig 16.1 
5.28 (1)r aie L a 1,b=4,c=4 


Hints or Answers to Exercises 455 


5.29 


5.30 
5.31 


Extend {v!, ..., v™} to an orthonormal basis {v!, ..., v”,..., v”}. Then 
Ixl? = Xia Kx VP + jmp vi. 
(1) orthogonal. (2) not orthogonal. 


Let A = QR = Q'R' be two decompositions of A. Then QTQ’ = RR! 
which is an upper triangular and orthogonal matrix. Since (Q7Q’)? = 
(QTQ) = (RR'“!)-! = R'R7™! is upper and lower, QTQ’ is diagonal 
and orthogonal so that QTQ’ = D = diag|d;| with d; = +1. i.e., Q’ = QD, 
or u = tu’ for each i > 1. Since c! = bju! = byu! with bhi, b11 > 0 
and so ul’, u! are unit vectors in c! direction, we have u!’ = u!. Assume 
that uJ!’ = w-! and uf’ = —u’. Then u’ becomes a linear combination 
of u!,..., uf}, since ec? = biju! +---+);;u = biju ++ bul". Thus, 
ul = uf, or dj = 1, for all j > 1, so that D = Id. Thus we get Q = Q', and 
then R = R’ follows. 


Exercises 5.12 


5.1 


Inner products are (2), (4), (5). 


5.2 For the last condition of the definition, note that (A, A) = tr(ATA) = 
Xaj dij = 0 if and only if aj; = 0 for all i, j. 
5.4 (1)k=3. 
5.5 (3) || fll = loll = 1/2, The angle is 0 ifn = m, § ifn Am. 
5.6 Use the Cauchy-Schwarz inequality and Problem 5.2 with x = (a1,--- , an) 
and y = (1,--- ,1) in (R”,-). 
1 
sa) 1, JB 
4 3 
(2) If (h, g) = h($+$+c) = 0 with h 4 0 a constant and g(x) = ax? +bx +c, 
then (a,b,c) is on the plane $ + $+c=0 in Rê. 
3 1 
5.10 (1) V2 (2) 5 V2 
5.12 Orthogonal: (4). Nonorthogonal: (1), (2), (3). 


5.16 


Use induction on n. If n = 1, then A has only one column ct and ATA = 
det(A7 A) is simply the square of the length of c1. Assume the claim is true 
for n—1. Let Byx(n—1) be the submatrix of A with the first column ct 


removed so that A = | ct B], and let Cmxn = | a B], where a= c! — p and 
p = Projy(c!) = age? +--+ anc” € W for some a;’s, where W = C(B). 
Then a is clearly orthogonal to c?,...,c* and p. Claim that 


det(AT A) = det(CTC) = |ja||? det(BT B) = |lal|?vol(P(B))? = vol(P(A))?. 
In fact, 


T T T 
T en ata+p p p` B 
= 0 p’p p'B 


ll 
a 
© 
tT 

es | 
v 
j=) 
v 
& 
0 
& 
cE 
+ 

(o 
© 
ot 

, N, 
[-——__ 
wid 
N 
ee | 
je] 
& 
VE 


456 


5.17 
5.19 


5.20 


5.21 


5.22 


5.23 


5.26 


5.27 


5.28 


5.29 


Hints or Answers to Exercises 


Since p is a linear combination of the columns of B, one can easily ver- 


T 
ify that det (| Pr [p B J) = 0, so that det(ATA) = det(CTC) = 
\|a||? det(BT B). This also shows that the volume is independent of the choice 
of c! at the beginning. 
Let A = ju! u? u |; Then the volume of the tetrahedron is aa 1. 
Ax = b has a solution for every b € R™ if k =m. It has infinitely many 
solutions if nullity=n—-k=n—m>0. 
The line is a subspace with an orthonormal basis gall, 1), or is the column 


space of A= | + |. 


1 0 1 
Find a least square solution of 3 : | , = i for (a,b) 
1 3 4 
; 3 
in y =a + bx. Then y = x + z 
1 -1 1 —1 
1 00 0 
Follow Exercise 5.21 with A= | 1 1 1 1 
1 2 4 8 
1 3 9 27 
Then y = 2g? — 4r? + 3a — 5. 
Let {vt}, ..., v*} be another orthonormal basis for U C R™. Then one can 
easily show that A = [ul --- ut] = [v! --- v*]Q = BQ for some orthog- 


onal matrix Q by Corollary 5.25. Thus Proj, = AA? = (BQ)(BQ)? = 
BQQTB = BB’. 


(1) Let h(a) = 5(f(@)+F(- o F Acie a): 2 Fry 
A has =f f (x)dxe =— [f —t)g(—t)dt 
= vie f(t)g(t)dt = -(f, . - E - a aah z= t. 
(3) Expand the a in the inner product. 
1 sin 0 cos 0 1 0 1 0 
Pa = _ 
ae sinOcos@ cos? @ Pe 1 Rook eh 


. E -iTr p-ł4 _ | sind cos 0 > pirr _| 1 sin@cosé 
QSA e o | Ra De ai cos? 0 ' 


ATA = I and det AT = det A imply det A = +1. 
The matrix A = | °° i ees is orthogonal with det A = —1. 
sin@ —cosé 


False: (1) Consider x = (1,0) and y = (—1,0) with Z(x,y) = m. (2) 
Consider two subspaces U and W of R? spanned by e; and eg, respectively. 
(4) T(u + n) = 2u for u +n € im(T) @ ker(T) satisfies the condition, but is 
not a projection. (5) A projection A is orthogonal if and only if AT = A. 
(7) The normal equation is always solvable even if ATA is not invertible. 


Hints or Answers to Exercises 457 


(9) For any x € R”, Ax € C(A). (12) An isometry needs not be surjective. 
(13) An isometry needs not be linear: a translation is an isometry, but not 
linear. 


True: (3) (See (15) of Exercise 4.24). The set of column vectors in a per- 
mutation matrix P are just {et,..., e”} in a certain order, so that P is an 
orthogonal matrix. (6). (8). (10). (11). (14). (15) |x- y|] + [|x| > Iiyll. 


Chapter 6 
Problems 
6.3 The characteristic polynomial of an upper triangular matrix satisfies 
à= an +e’ * 
dot- A da S, oo SOain 
0 À — Ann 


6.4 


6.5 


6.6 


6.7 
6.8 


6.9 
6.10 


6.11 


6.12 


Zero is an eigenvalue of AB if and only if AB is singular, if and only if BA 
is singular, if and only if zero is an eigenvalue of BA. Let À be a nonzero 
eigenvalue of AB with (AB)x = Ax for a nonzero vector x. Then the vector 
Bx is not zero since A Æ 0, but 


(BA)(Bx) = B(Ax) = \(Bx). 


This means that Bx is an eigenvector of BA belonging to the eigenvalue A, 
and à is an eigenvalue of BA. Similarly, any nonzero eigenvalue of BA is also 
an eigenvalue of AB. 


. 1 1 
Check with A = | 01 |. 


Consider the matrices | : i and | Tao | 


If A is invertible, then AB = A(BA)A™. 
(1) Use det A = Ay ++- Àn. (2) Ax = Ax if and only if x = AA“ tx. 


asil Tas g | Then A=QDQ71=| 7 ah 


(1) The eigenvalues of A are 1, 1, —3, and their associated eigenvectors are 


(1) If Q = [x1 x2 x3] diagonalizes A, then the diagonal matrix must be AJ and 
AQ = AQI. Expand this equation and compare the corresponding columns 
of the equation to find a contradiction on the invertibility of Q. 


With the basis a = {1,2,27}, [T]la = A = 


oor 
on oO 
wnoneo 


458 


6.13 


6.14 


6.15 


6.16 


6.17 


6.18 
6.19 
6.20 
6.21 


6.22 
6.23 


6.24 
6.25 


6.26 
6.28 


Hints or Answers to Exercises 


With the standard basis for M2x2(R) : a= 


1 1 0 1 
1 1 1 0 ; í 

T]a = A= oiii The eigenvalues are 3, 1, 1, —1, and their asso- 
1 0O 1 1 

ciated eigenvectors are (1,1,1,1), (—1,0, 1,0), (0,—1,0,1), and (—1,1,—1, 1), 


respectively. 
4 0 1 

With respect to the standard basis a, [T], = | 2 3 2 | with eigenvalues 
1 0 4 


3, 3, 5 and eigenvectors (0,1,0), (—1,0,1) and (1,2,1), respectively. 

(1) If x = 0, clear. Suppose x # 0 Æ y. For any scalar k, 

0 < (x — ky,x — ky) = (x, x) — k(x, y) — kly, x) + kk(y,y). Let k = x) 
to obtain |(x,x)(y,y) — |(x, y)|? > 0. Note that equality holds if and only if 
x = ky for some scalar k. 

(2) Expand ||x + y||? = (x +y, x + y) and use (1). 

(1) Eigenvalues are 0, 0, 2 and their eigenvectors are (1,0,—7) and (0,1,0), 
respectively. (2) Eigenvalues are 3, H5, S; and their eigenvectors are 


(1,—i, 454), (51,1, 51 + i)), and (51,1, H3 (1 + 0), respec- 
tively. 

Suppose that x and y are linearly independent, and consider the linear depen- 
dence a(x+y)+0(x—y) = 0 of x+y and x—y. Then 0 = (a+b)x+ (a—b)y. 
Since x and y are linearly independent, we have a+b = 0 and a—b = 0 which 
are possible only for a = 0 = b. Thus x + y and x — y are linearly indepen- 
dent. Conversely, if x+y and x—y are linearly independent, then the linear 
dependence ax+by = 0 of x and y gives $(a+b)(x+y)+$(a—b)(x—y) = 0. 
Thus we get a = 0 = b. Thus x and y are linearly independent. 

Refer to the real case. 

(AB)# = (AB)? = BA = BHAY, 

(AP ARE = (AtA) =]. 

The determinant is just the product of the eigenvalues and a Hermitian ma- 
trix has only real eigenvalues. 

See Exercise 6.9. 

To prove (3) directly, show that A(x - y) = E(x : y) by using the fact that 
AĦx = —ux when Ax = px. 

A”® = BË + (iC)# = BT — iCT =-B-iC=—A. 

+AB =(AB)¥ = BHA" = (+B)(+A) = BA, + if they are Hermitian, — if 
they are skew-Hermitian. 

Note that det U” = det U, and 1 = det J = det(U#U) = | det U|?. 


Since A~! = A”, (AB)#AB =I. 


| 


Hints or Answers to Exercises 459 


6.30 Hermitian means the diagonal entries are real, and diagonality implies off- 
diagonal entries are zero. Unitary means the diagonal entries must be +1. 


siV3 +4 —ZiVv34+ 3 5 — 5iv3 0 
32 (1)IfU =| 8 2 6 2 UAU = a 
6.32 (1) fU -i3 13 ,U-1AU i Ly hiv 
0 0 1 -1 0 0 
D) FU=]| ffi gh Stet | (U TAV =| 0 2i 1 
0 -1 0 0 0 2i 


6.34 (4) Normal with eigenvalues 1 + ¿ so that it is unitarily diagonalizable but 
not orthogonally. 

6.36 This is a normal matrix. From a direct computation, one can find the eigen- 
values, 1 — ¿, 1 — į and 1+ 2i, and the corresponding eigenvectors: (—1, 0, 1), 
(—1,1,0) and (1,1,1), respectively, which are not orthogonal. But by an 
orthonormalization, one can obtain a unitary transition matrix so that A is 
unitarily diagonalizable. 

6.37 AMA = (Hı a Hə)(Hı + Hə) = (Hı + Hə)(Hı = Hə) = AAT if and only if 
A, Hə — H>H; =0. 

6.38 In each subproblem, one directions are all proven in the theorems already. 
For the other direction, suppose that U" AU = D for a unitary matrix U 
and a diagonal matrix D. 


(1) and (2). If all the eigenvalues of A are real (or purely imaginary), then 


the diagonal entries of D are all real (or purely imaginary). Thus DË = +D, 
so that A is Hermitian (or skew-Hermitian). 
(3) The diagonal entries of D satisfy |A| = 1. Thus, D? = D-t, and 
AH = UDU! = A}. 
i v3 -v2 -1 
6.39 Q = — 0 v2 -2 
Vol 3 v2 1 
1 -l 1 1 
St 3 
6a (1) A= 3] _j ileal i 
(2) B Be 1 Mev Or) EEN 1 U-=v6)(2+i) 
6 (1+Vv6)(2—i) 7+2V6 j 6 (1—v6)(2—i) 7—2V6 
5 5 5 5 


6.41 Let A=A,P,4+-- vale Akg Pk be the spectral decomposition of A. Then 
AË = MPi + + AnPe = MPi t APh = A. 


6.42 Take the Lagrange polynomials f; such that f;(A;) = ĝi; associated with ),’s 
as in Section 2.9.2. Then, by Corollary 6.20, 


Fil A) = fil) Pi Hee + fin) Pe = fia Pi +e + big Pe = Pi. 


—1 —1/3 
6.43 (1) A= 2 |= 2/3 | [ 3 1 [1], 
2 2/3 
Aen a 
At = J = 3] [13 2⁄3 23/⁄3]=[-; 3 3] 


460 


6.44 


6.46 


Hints or Answers to Exercises 


1 1 1 2 1 
Q)B=| 7 S J ye v6 t| 
Ye wa ee a ae 0 
1/2 0 
a 
=|) E 


Let A = QR be a QR decomposition of A by an m x k matrix Q with 
k orthonormal columns and a k x n matrix R both of rank k. Then, by 
Corollary 6.25, Qt = (Q7Q)-!Q7 = Q” and Rt = R¥(RR")-1. Thus 
At = Rt+Q* = R#(RR®)-1Q” by Lemma 6.28. 


By the elementary row operations and then to columns in the blocks, 
X -Y X+iY -Y+ix 
det B = ael $ x | = aet y x 
O X +iY 0 _ Far 2 
= aet | Y xo iy | at Adet A = fact AI : 


Exercises 6.11 


6.1 


6.2 


6.4 
6.5 


6.7 


6.8 


6.9 


6.10 


6.12 
6.13 


(4) 0 of multiplicity 3, 4 of multiplicity 1. Eigenvectors are e; — e;4; for 
1<i<3 and De ei. 

F(A) = (A +2)? — 8A + 15), AL = —2, à2 = 3, Az = 5, 

xı = (—35, 12, 19), x2 = (0, 3, 1), x3 = (0, 1, 1). 

{v} is a basis for N(A), and {u, w} is a basis for C(A). 

Note that the order in the product doesn’t matter, and any eigenvector of 
A is killed by B. Since the eigenvalues are all different, the eigenvectors 
belonging to 1, 2, 3 form a basis. Thus B = 0, that is, B has only the zero 
eigenvalue, so all vectors are eigenvectors of B. 


[i -2 -1 
A=QDQ=35|1 4 -1 
te a 


Note that R” = W @ Ker(P) and P(w) = w for w € W and P(v) = 0 for 
v € Ker(P). Thus, the eigenspace belonging to A = 1 is W, and that to 
à = 0 is Ker(P). 

For any w € R”, Aw = u(v’ w) = (v - w)u. Thus Au = (v- u)u, so u is an 
eigenvector belonging to the eigenvalue A = v-u. The other eigenvectors are 
those in v+ with eigenvalue zero. Thus, A has either two eigenspaces E(A) 
that are 1-dimensional spanned by u and E(0) = v+ if v-u Æ 0, or just one 
eigenspace £(0) = R” ifv-u=0. 

Av = Av = A’v = iv implies \(A — 1) = 0. 

Use tr(A) = À +e + An = Aq, Hee + ann. 

(1) If k = 1, clearly x, € U. Suppose that the claim is true for k, and 
xy +e + Xk +Xk41 =u E U with x; € Ea, (A). Then, from 


T 


A(x1 +e Xk+1) = yxy tee +H AkXk + ARG XRG = Au=ūuceU 
Ak+1X1 tees + Aggie + Ak+1Xk+1 = Aky1u EU, 


Hints or Answers to Exercises 461 


6.14 


6.16 


6.19 
6.22 


6.24 


6.25 


6.28 


6.30 


6.31 


6.32 


6.33 


6.34 


we get (Ay — Ak+1)X1 treet (Ak — Ak+1)Xk = ū — àķ}ıu € U. Thus by 
induction all the x;’s are in U. 


(2) Write R” = En, (4A) @---@ Ex, (A). Then UN Ex, (A), for i = 1,...,k, 
span U since any basis vector u in U is of the form u = xı +--- +x, with 
x; € U N Ea, (A) by (1). Thus U = U N Eh, (A) -BUN Ey, (A). 

A= QD Q7! and B = QD2Q7! imply AB = BA since Dı Də = Də D4. 
Conversely, suppose AB = BA. If Ax = \;x with x € E),(A), then ABx = 
BAx = A;Bx implies Bx € E),(A). That is, each eigenspace F),(A) is 
invariant by B, so that the restriction of B on E),(A) is diagonalized. 


1 0 1 
With respect to the basis a = {1,x, x°}, [T]la = | 0 1 1 |. The eigen- 
1 1 0 


values are 2, 1, —1 and the eigenvectors are (1,1, 1), (—1, 1,0) and (1,1, —2), 

respectively. 

(1) V6, (2) 4. 

(1) A=i,x=t(1, —2 — i), A = —i, x= t(1, —2 + i). 

(2)\=1,x=t(i, 1), A= -—1, ER SI) 

(3) Pigenvaliies are 2, 2+ i, 2 — i, and egeret are (0,—1,1)), 

(1,—4(2 + 4), 0), (1, —1(2 — i), 1). 

(4) Eigenvalues are 0, —1, 2, and eigenvectors are 

(1,0,-1)), (1,-7,1), (1, 2i, 1). 

A-+cl is invertible if det(A + cI) 4 0. However, for any matrix A, det(A + 

cI) = 0 as a complex polynomial has always a (complex) solution. For the 

real matrix | co ee , A + rI is invertible for every real number r 
sin 0 cos 0 

since A has no real eigenvalues. 


1 i l—i 
1 1 L= 1 ; 
lite a |) e v2 0 


1 a —1 +i 
aa-4[1 2] 


(1) Unitary; diagonal entries are {1, i}. (2) Orthogonal; {cos 0+7 sin 6, cos 0— 
isin}, where 0 = cos~!(0.6). (3) Hermitian; {1, 1+ v2, 1— V2}. 
det(U — AI) = det(U — AI)? = det(U? — AI). 

1 |1 -1 H 24- 0 
v=] ar D=u#au =| 0 ani 
If À is an eigenvalue of A, then A” is an eigenvalue of A”. Thus, if A” = 0, 
then \” = 0 or A = 0. Conversely, by Schur’s lemma, A is similar to an 
upper triangular matrix, whose diagonals are eigenvalues that are supposed 
to be zero. Then it is easy to conclude A is nilpotent. 


(1) Let Ai,...,Ax% be the distinct eigenvalues of A, and A = \yP, +- -++ Ak Pk 
the spectral dscomnosition of A. Then A? = \, P,+---+Apx Px. Problem 6.42 


(1) 


462 


6.35 


6.36 


6.37 


6.38 


6.39 


Hints or Answers to Exercises 


shows that P; = f;(A), where f;’s are the Lagrange polynomial associated 
with A,’s as in Section 2.9.2. Then 


where g = >> A; fi- The converse is clear. 
(2) Clear from (1) since A” = g(A). 
(See Exercise 6.14.) Since AB = BA, each E),(A) is B-invariant. Since B is 
normal, B” = g(B) for some polynomial g. Thus each E),(A) is both B and 
B” invariant. So the restriction of B on E),(A) is normal, since B is normal. 
That is, A and B have orthonormal eigenvectors in E),(A) A Ep, (B). 
(2) The characteristic polynomial of W is f(A) = àA” — 1. Hence, Ax, = w* = 
en )k/n for k =0,1,2,...,n—1. 
(3) The eigenvalues of A are, for k = 0,1,2,...,n—1, 

n=l 

Àk = 5 ajul = a1 + aw" +e + anw OD, 

j=0 
(5) The characteristic polynomial of B is f(A) = (A—n+1)(A+1)""1 
The eigenvalues are 1, 1, 4, and the Tr eigenvectors are 
(4 1.0), (-+ 1 v2) and (+4, =, +). Therefore, 


ET Ye Java VV Va 
,f 2 -1 -1 111 
A=] -1 2 esl ee ay 11 
3] 1 2 3ļ]111 


Note that Atb = x, is the unique optimal least square solution in R(A) = 
R(U). For any b € R”, (UT(UUT)-H(LTL)-!L")b satisfies the normal 
equation: 


AT A(UT (UUT) HLL) !LT)b = (UTL"LU)(UT(UUT) HLT L) 'L")b 
= UT(LTL)\(UUT)\(UUT) (LTL) t LTb 
UL" b = A"b, 


so that it is a least square solution. Moreover, it is in R(A) = R(U), since UT 
times a column vector is a linear combination of row vectors in U which form 
a basis for R(A). Therefore, it is optimal so that UT (UUT)-!(LTL)-!L"b = 
Atb. That is, UT (UUT) LET L)- 1p? = At, 


For A = [x], the only eigenvalue of A? A = [x? x] = |||x||?] is ||x||?, so that 
its singular value is ||x||. The unit eigenvector of A? A is u = [1], and that 
of AAT = xx" is v = Taq xl: Thus A = PS lili). Its pseudo-inverse is 


Ar = [1] [ar pea)" = pee bl”. 


Hints or Answers to Exercises 463 


6.40 The eigenvalues of AAT are 5, 3, 0. The pseudo-inverse At is 
1/5 0 1/5 
—1/15 1/3 4/15 
4/15 —1/3 —1/15 
1/5 0 1/5 


6.41 [O]mxn = Hd]mxm0]mxn H d]nxn; and Orn = Onxm: 


6.42 False: (1) For A = k |2- Q- ‘AQ means that | | lela eh 


0 
1 
(2) Consider A = E pant B = E Jal neal a 


) 
8) If A is similar to I + A, then they must have the same eigenvalues so that 
tr(A) = tr(I + A) = n+ tr(A), which is not true. (6). 


N 


cos —siné ; 
sin 0 cos 0 | wate 
0 # kr. (10) Since rank(A) = rank(A# A) = rank(AA”), A = 0 may be 
an eigenvalue for both A? A and AA” with the same multiplicities. Now 
suppose A Æ 0 is an eigenvalue of A” A with eigenvector x. Then Ax # 0, 
and (AA”)Ax = A(A"” A)x = \Ax implies À is also an eigenvalue of AA? 
with eigenvector Ax. The converse is the same. Hence, A” A and AA” have 
the same eigenvalues. 


(8) R” is not a complex vector space. (10) Consider | 


dii > | 09 P= i o o as) A=| 3 ear 


True: (7) tr (A+B) = tr (A) + tr (B). (9). (10). (12) A is a unitary 
matrix. (13) A permutation matrix is an orthogonal matrix. 

(15) If A is Hermitian, then it is unitarily diagonalizable: U-'AU = D, 
where U is a unitary matrix and D is a diagonal matrix. Moreover, if A is 
nilpotent: A’ = 0 for some k > 1, then D* = 0, or all the eigenvalues of A 
must be zero. Thus the diagonal matrix D is the zero matrix, and so is A. 
(16) Schure’s Lemma. (17) If A + iJ is not invertible, then det(A + iJ) = 0, 
which means —i is an eigenvalue of A, but a complex number can not be an 
eigenvalue of a Hermitian matrix. 

(19) Modify (17): —4 can not be an eigenvalue of a orthogonal matrix. 

(20) If U4“ AU = D with real diagonals, then A? = A. 


Chapter 7 
Problems 
An+1 2 1 -2 An 
7.1 Note that An =]1 0 0 Gn—1 |. The eigenvalues are 1, 2, 
An-1 0 1 0 An—2 
—1 and eigenvectors are (1, 1,1), (4,2,1) and (1, —1,1), respectively. It turns 
2 2” 
t that an = 2 — (—1)” = — — 
out that a (—1) qo Fe 


464 


7.2 


7.5 


7.6 


7.7 


7.8 


7.9 


7.10 


7.11 


Hints or Answers to Exercises 


Write the characteristic polynomial as 
f(x) =a" — as! —..-— apx — ap = (x — A)” g(a), 
where g(A) # 0. Then clearly f(A) = F'(A) =- = f&-Y(A) = 0. For 
n > k, let fi(x) = x°} f(x) = x” — az”! — --- — agx”—*. Then one can 
easily show that 
PaA) = AFIA) = nd” — a(n — 1)A"™1 — --- — ag (n — k)A"-* = 0, 


since f{(A) = (n — k)A"—¥-1 f(A) + A"-¥ f(A) = 0. Inductively, 
fin) = Afin-1A) = A? fm) + Afin—2) = 0 
nir — a(n _ mee gee eee: a(n p jy eer, 


Thus, 2, = A", nA”, ..., n71X” are m solutions. It is not hard to show 
that they are linearly independent. 

The eigenvalues are 0, 0.4, and 1, and their eigenvectors are 

(1,4, —5), (1,0, -1) and (3, 2,5), respectively. 

For (1), use (A + B)* = y7t_, (*) A‘B*~* if AB = BA. For (2) and (3), use 
the definition of e4. Use (1) for (4). 

Note that e(4") = (e4) by definition (thus, if A is symmetric, so is e4), and 
use (4). 


0 3 0 
Write A=2I +N with N=]|0 0 3 |. Then N? =0. 
0 0 0 
y = 4e?” — ioe; y= ae” + oe”, 
y = — C2e?” + cze?” yi = e2% — Qe 
yo = ce” + 2cge?” — cze?” , yo = e” — Qe?” + 2e3* 
y3 = 2c2e?® — cze’? ys = — 2e?” + 23”, 
ay 3e' — 2 
a) [ei], @ | ae 
e —t 


e 


Exercises 7.6 


7.1 


7.2 


7.3 


The characteristic equation is A2—x\—0.18 = 0. Since À = 1 is a solution, £ = 

0.82. The eigenvalues are now 1, —0.18 and the eigenvectors are (—0.3, —1) 

and (1, —0.6). 

Eigenvalues are 1, 1, 2 and eigenvectors are (1,0,0), (0,1,2) and (1, 2,3). 

A!?x = (1025, 2050, 3076). 

The initial status in 1985 is x9 = (£o, Yo, zo) = (0.4,0.2,0.4), where a, y, z 

represent the percentage of large, medium, and small car owners. In 1995, the 
zı 0.7 01 0 0.4 

status is xı = | yı | = |] 03 0.7 0.1 0.2 | = Axo. Thus, in 2025, 
zı 0 02 09 0.4 

the status is x4 = Axo. The eigenvalues are 0.5, 0.8, and 1, whose eigenvec- 

tors are (—0.41, 0.82, —0.41), (0.47, 0.47, —0.94), and (—0.17, —0.52, — 1.04), 

respectively. 


Hints or Answers to Exercises 465 


7.4 Clearly, ag = 1, a; = 2 and az = 3. Inductively, one can easily see that the 
sequence {an : > 1} is a Fibonacci sequence: dn41 = Gn +an-—ı. In fact, in 
{1,2,...,n}, the size of the class of subsets with the required property may 
be counted as the number of the members of the class for the set without n 
plus that of the class for the set without n and n — 1, to each member of this 
class just add n. 

7.5 (1) Since the eigenvalues of a skew-Hermitian matrix must always be purely 
imaginary, 1 cannot be an eigenvalue. 


(2) Note that, for any invertible matrix A, (e4) = e^" =e A= (erry 
a_|e e-l 
7.7 (1) e4 = 0 l 
7.8 (1) f(A) = à? — 10A? + 28A — 24, eigenvalues are 6, 2, 2, and eigenvectors are 
(1,2, 1), (-1,1,0) and (—1,0, 1). 
(2) f(A) = (A-1)(2 — 6\ + 9), eigenvalues are 1, 3, 3, and eigenvectors are 


(2, 1,1), (1,1,0) and (1,0, 1). 
7.9 One can easily check that det A, = det A,_1 — det An_2. Set a, = det Ap, 
so that an = an—1 — Gn_2. With an-ı = an_1, we obtain a matrix equation: 


ot an Zs I wat An-1 z. — An 
sela E ee 


with aı = 1 and ag = 0. Using the eigenvalues might make the computation 

a mess. Instead, one can use the Cayley-Hamilton Theorem 8.13: Since the 

characteristic polynomial of A is A? — \ +1, A? — A + I = 0 holds. Thus, 

A’ = A? — A = —], so A® = I. One can now easily compute a, modulo 6. 
7.10 Hint: By induction on n, it is good enough to show that 


1 (aiaz) i 1 _ (a1a2... an) 
ay = = » at ss , 
ag (a2) (aza3.-an) (agag ...an) 
(a3a4..-an) 
or a1 (a2... an)+(a3 ... an) = (aiaz... an). Use (aiaz +- an) = (an +++ a2a1). 
yi(z) = —2e?0-2) 4 4e%e-D a E 
7.12 (1) yo(2) = —¢2(1-2) 4 2e2?(2—1) (2) yi(z) = e (cos x — sin x) 
7 = yo(x) = Qe?” sin x. 
y3(a = 2e2(1 z) _ 9¢2(# 1), 
7.13 yı =0, y2 = 2e”, yz = e%. 
Chapter 8 
Problems 
410 D 
8.2 (2) |0 41l, 68) 
00 4 001 1 
0 0 0 1 
8.3 (1) For \ = —1 = ), x2 = (0, 1, 


eee ), and for À = 0, x, = 
(—1, 1, 1). (2) For À = , 5, 0), and for à = —1, 
1 


x1 = (-9, = 


466 


8.4 


8.5 


8.6 


8.7 


8.8 
8.9 


8.10 


Hints or Answers to Exercises 


2 0 0 0 0 10 0 

; = _{|0 2 0 0 0010 

(1) Write A= I + J with I = 002 0 and J = 0000 
000 1 000 0 


Then clearly J? = 0, and IJ = JI. Thus 


9k pgk-1 EE-D 9k-2 0 ] 
k = k = 0 Qk kak} 0 
k _ 7k k-1 k-2 72 _ 
Ab =I +G) s+ (5) ene & A "AE 
0 0 0 1 


For any u,v € C, 
|u|? + 2R(uv) + |v]? 
|u|? + 2ļuð] + Jol? 


= ul? + 2lulo| + fol? = (ful + lu). 


ljutol? = (utv)(a+o) 


II 


IA 


The equality holds © |ud| = R(ud): i.e., ud = R(uv) € R & u = |ulz and 
v = |v|z for some z = e°. 

Consider the Jordan Canonical form: Q-!A?TQ = J of AT. By taking the 
transpose of this equation, one gets P~'AP = JT, where P = (Q7)~!. Let 
P be the matrix obtained from P by taking reverse ordering of the column 
vectors in each group corresponding to each Jordan block of P. Then it is 
easy to see that PP AP = f= Q-!A7TQ. That is, A and AT have the same 
Jordan Canonical form J, which means that the eigenspaces have the same 
dimension. 


The eigenvalue is —1 of multiplicity 3 and has only one linearly independent 
eigenvector (1,0,3). The solution is 
yi(t) —1 — 5t + 2t? 
y(t)=| w(t) |=| -14t 
y3(t) 1 — 15t + 6t? 


See Problem 6.2. 
Let A1,.--,An be the eigenvalues of A. Then 


f(A) = det(AT — A) = (A— 1) -+ (À — An). 
Thus, f(B) = (B — ilm): -- (B — AnIm) is non-singular if and only if B — 


Nilm, i = 1,...,n, are all non-singular. That is, none of the \,’s is an 
eigenvalue of B. 


The characteristic polynomial of A is f(A) = (A—1)(A—2)?, and the remain- 
14 0 84 

der is 104A? — 228A + 1387 = | 0 98 0 
0 0 98 


Exercises 8.5 


Hints or Answers to Exercises 467 


8.1 Find the Jordan canonical form of A as Q~!AQ = J. Since A is nonsingular, 
all the diagonal entries À; of J, as the eigenvalues of A, are nonzero. Hence, 
each Jordan blocks J; of J is invertible. Now one can easily show that 
(Q-1AQ)~! = Q-1A!Q = J~! which is the Jordan form of A~t, whose 
Jordan blocks are of the form J; '. 


8.3 (x,y) = $(4+i,%). 


ssayue[? 1 ]J=[ 


1 
5 
-6 2 8 —2 0 ł 0 ż 
8.6 (1) Us A= | -3 1 2 0 0 0 2 1 
6 -1 —4 0 —4 i -i 0 
yi (t) = —2e2(1—t) J 4e2(t 1) 
(2) yo(t) = e2(1-t) L 2 2(t—1) 
y3 (t) Z 2e21—t) 22l 1) 
y(t) = 2(t—1)e 
8.7 yalt) = —2tet 
ys(t) = (2¢-1)e’ 
8.8 (1) (a—d)?+4be £0 or A =al. 
8.9 (1) t +¢—11, (2) f+ 26413, (3) -1)( —2t—5). 
1 0 -l 
8.10 (3) A™t=]| 0 § -ż 
0 0 1 
5 —22 101 
8.11 (2) | 0 27 —60 
0 0 87 
8.12 See solution of Problem 8.6. 
8.14 (3) Use (2) and the proof of Theorem 8.11. 
(4) Use (3), Theorem 8.11, and (1) of Section 8.3. 
1 0 k 
8.15 (2) A= | 0 2* 2—1 
0 0 1 
1 000 
; ; 0100 
8.16 True: (1) Schur’s Lemma. (2). (4) Jordan form is 00 2 1 
0002 
(7) J and J” are similar (see Remark on page 340). (8) By Cayley-Hamilton 


Theorem. 


False: (3) See (2). (5) By Theorem 6.9. (6) det e” = ef^. 


468 Hints or Answers to Exercises 
Chapter 9 
Problems 
9 3 -4 011 ; A : p 
9.1 (1) 3 -1 1|, OY Pe 0 1|,(68) 
-4 1 4 A eG ie T 
-5 0 2 -1 
9.3 (1) The eigenvalues of A are 1, 2, 11. (2) The eigenvalues are 17, 0, —3, and 
so it is a hyperbolic cylinder. (3) A is singular and the linear form is present, 
thus the graph is a parabola. 
9.5 B with the eigenvalues 2, 2 + V2 and 2— V2. 
9.6 For any x Æ 0, Cx # 0 since C is non-singular. Hence, x7(C7AC)x = 
(Cx) A(Cx) > 0 since A is positive definite. 
9.7 The determinant is the product of the eigenvalues. 
9.9 (1) is indefinite. (2) and (3) are positive definite. 

9.10 (1) local minimum, (2) saddle point. 

9.12 (2) b11 = bi4 = b41 = b44 = 1, all others are zero. 

9.14 Let D be a diagonal matrix, and let D’ be obtained from D by interchanging 
two diagonal entries dy and dj;, i # j. Let P be the permutation matrix 
interchanging i-th and j-th rows. Then PDP? = D’. 

9.15 Count the number of distinct inertia (p,q, k). For n, the number of inertia 
with p=iisn—it+l. 

9.16 (3) index = 2, signature = 1, and rank = 3. 

9.17 Note that the maximum value of R(x) is the maximum eigenvalue of A, and 
similarly for the minimum value. 

9.18 max=% at +(1/V2, 1/V2), min=4 at +(1/V/2, —-1/V2). 

1 1 

9.19 (1) max = 4 at +—~ (1, 1, 2), min = —2 at +—~(-1, —1, 1); 

(1) Falls 12) =I ) 
1 1 

2) max = 3 at +— (2, 1, 1), min =Q at +— (1, —1, —1). 

(2) Jo ) Ja ) 

9.22 If u € U N W, then u = ax + Gy € W for some scalars a and Ø. Since 
x, y € U, b(u, x) = b(u, y) = 0. But b(u, x) = bbly, x) = —8 and 
b(u, y) = ab(x, y) =a. 

9.23 Let c(x,y) = 3(b(x,y) + b(y,x)) and d(x,y) = $(b(x,y) — b(y,x)). Then 


b=c+d. 


Exercises 9.13 


9.1 


1 1 2 3 3 -2 0 


2 
Og a E 


Hints or Answers to Exercises 469 


9.3 


9.5 


9.7 
9.8 


9.10 
9.11 
9.16 


9.18 


(i) If a = 0 = c, then à; = +b. Thus the conic section is a hyperbola. 
(ii) Since we assumed that b Æ 0, the discriminant (a — c)? +4b? > 0. By the 
symmetry of the equation in x and y, we may assume that a — c > 0. 


If a—c = 0, then A; = a+b. Thus, the conic section is an ellipse if 
MA2g = a? — b? > 0, or a hyperbola if a? — b? < 0. If MA2 = a? — b? = 0, 
then it is a parabola when \, # 0 and e’ Æ 0, or a line or two lines for the 
other cases. 


Ifa—c>0. Let r? = (a = o}? +4? > 0. Then à; = &H2E for i = 1,2. 
Hence, 4\1\2 = (a +c)? — r? = 4(ac — b°). Dei tke eae section is an 
ellipse if det A = ac — b? > 0, or a hyperbola if det A = ac— b? < 0. If 
det A = ac— b? = 0, it is a parabola, or a line or two lines depending on some 
possible values of d’, e’ and the eigenvalues. 
If À is an eigenvalue of A, then à? and + are eigenvalues of A? and A`}, 
respectively. Note xT (A + B)x = xT Ax + x7 Bx. 
(2) {(2, 1, 2), (-1, —2, 2), (1, 0, O)}. 

1 


(1) Q = 1 E . The form is indefinite with eigenvalues A = 5 and 
A= 

2 -1 3 9 1 2 
(1) A wg o @8=[9 shoei 4] 


(2) The signature is 1, the index is 2, and the rank is 3. 
(2) The point (1,7) is a critical point, and the Hessian is | f E . Hence, 


f(1,7) is a local maximum. 


False: (5) Consider a bilinear form b(x, y) = x,y — Z2y2 on R?. 
(7) The identity I is congruent to k?I for all k € R. (8) See (7) 
(9) Consider a bilinear form b(x, y) = x1y2. Its matrix Q = | ; : is not 


diagonalizable. 


True: (1). (2). (3). (4). (6). (10). (11). 


